AD-A096  452 

UNCLASSIFIED 


MARYLAND  UNIV  COLLEGE  PARK  DEPT  OF  COMPUTER  SCIENCE  F/G  9/2 

AN  EXPERIMENTAL  INVESTIGATION  OF  COMPUTER  PROGRAM  DEVELOPMENT  A— ETC(U) 
DEC  79  R  V  REITER  AFOSR-77-3181 

TR-853  AFOSR-TR-81-0214  NL 


’  COMPUTER  SCIENCE 
TECHNICAL  REPORT  SERIES 

„  DTIC 

J^E'-EC'fE 
ffl  i|Jl&  )|)  MAR  1  7  1981 

wW  & 

>^J85fe^X 

E. 


3 


UNIVERSITY  OF  MARYLAND 

COLLEGE  PARK,  MARYLAND 
20742 


f0-  p'lbUO  '■■'“"j 

gSSS^—l“fc  ^ 

81  3  16  035 


Technical  Report  TR-853 


December  1979 

An  Experimental  Investigation  of 
Computer  Program  Development  Approaches 
and  Computer  Programming  Metrics* 

by 

Robert  William  Reiter,  Jr. 


F 


AIR  FORCE  OFFICE  OF  SCIENTIFIC  RESEARCH  (AFSC) 

NOTICE  OF  J:”  '  '.  .'VITAL  TO  LDC 
Thu;  tecr.V  t.  .  ••  •.  n  ve’/ievred  and  is 

approved  s*  '  -.  *  I  .:j,:  IAiV  AFR  190-12  (7b) 

Di.itriLv.t  i  .vi  13  lu/Iiiiuled, 

A.  D.  FLGJE 

Technical  ini’craation  Officer 


Dissertation  submitted  to  the  Faculty  of  the  Graduate  School 
of  the  University  of  Maryland  in  partial  fulfillment 
of  the  requirements  for  the  degree  of 
Doctor  of  Philosophy 
1979 

*Research  supported  in  part  by  the  Air  Force  Office  of  Scientific 
Research  Grant  AFOSR-77-3181.  Computer  time  supported  in  part 
through  the  facilities  of  the  Computer  Science  Center  of  the 
University  of  Maryland. 


PORT  DOCUMENTATION  PAGE 


1  "  0 


4  TITLE  (nnd  Subtitle) 


/  ^  JXPERMNTAL  ^INVESTIGATION  OF  ^OMPITIER*  - 
/  PROGRAM  DEVELOPMENT  APPROACHES  AND  QDMPITIER 
(  PROGRAMMING  METRICS*  ; .  . -  -  ' 


KKAD  INSTRUCTIONS 
Mi* *  I- OK!-,  rOMPi.KTlNf,  h'RM 


recipient's  catalgo  NUMBLR 


5  TYPE  OF  REPORT  ft  p;;fc|~r,  -,-:./El<F: 

Tp  T:;V  J 

7  Vr# nm 


I 


AUTHOR/*}  1 

^  Robert  William/Reiter,  Jr^ 

js  CONTRACT  or  grant  njmbe* 

/-  /  -AF0SR -77-31 81/ 

performing  organization  name  and  ADDRESS 

tc  p  r  0  *  t.  a  M  ELEMENT  P  P  0  J  F.  L  - 

University  of  Maryland 

AREA  *  A  r  R  K  J  N  1  T  N  J  V  **  ^ 

Department  of  Mathematics 

College  Park,  Md.  20742 

61 102F-  '-.2304/  A2 

11  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Air  Force  Office  of  Scientific  Research/NM 

Bolling  AFB,  Washington,  DC  20332  'v  \j>*  number  of  pa-cs 


REPORT  C h***? 

/  t  Docaillf  MWW7Q 


MON  1  TORI  N  G  AGENCY  NAME  ft  ADORES  SO/  <h  fferent  /rorj  Conf  r  nil  m  /•  Of  fi  r  <■' )  ’S.  SECuR'Ty  c  V. 


UNCIASSIFIED 

i  s  «  c  E  c  l  aTs* i  f i~  ~”a’t7 

SCHEDULE 


Distribution  statement  fo/  rh/s  /?cporu 

Approved  for  public  release;  distribution  unlimited. 


DISTRIBUTION  STATEMENT  (el  the  nbstrect  enfered  in  Hlnrk  ^0,  1/  different  (rorr  Rev  "ft 


ABSTRACT  (C^ntinn*  on  reerr.te  side  If  tie  mi  nm/  frimifi /v  he!  hi  nr  k  msteh^r' 

*  There  is  a  need  in  the  emerging  field  p£  software  engineering,  for 
empirical  study  of  software  development  approaches  and  software  metrics. 

An  experiment  has  been  conducted  to  compare  three  programming  environments 
individual  programming  under  an  ad  hoc  approach,  team  programming  under  an 
ad  hoc  approach,  and  team  programming  under  a  disciplined  methodology.  This 
disciplined  methodology  integrates  the  use  of  top-down  design,  process  design 
language,  structured  programming,  code^rpading,  and  chief  programmer  team 
organization.  Data  was  obtained  for  -altefee  number  of  automatable  software 


DD  ,  janM73  1473  L'f\r  ^  UNCLASSIFIED  J  " 

H  V  J  C  •  S  E  CUR  T  y  c  l”a  SSlVlC  at  ion  c~  ThT  paI'77  rtf*  *  F:r  r  f  rU 


_ UNCLASSIFIED _ 

SECoMITY  CLASSIFICATION  OF  THIS  PAGE(*7ien  Oatm  Fntrmtl) 


^metrics  characterizing  the  software  development  process  and  the 
developed  software  product.  The  results  reveal  several  statistically 
significant  differences  among  the  programming  environments  on  the  hi  sis 
of  the  metrics.  These  results  are  interpreted  as  demonstrating  the 
advantages  of  disciplined  team  programming  in  reducing  software  develop¬ 
ment  costs  relative  to  ad  hoc  approaches  and  improving  software  product 
quality  relative  to  undisciplined  team  programming. 


SECURITY  CLASSIFICATION  of  this  PAGEr'**?’»n  Data  tniatad) 


abstract 


Title  of  Dissertation:  An  Experimental  Investigation  of 

Computer  Program  Development  Approaches 
and  Computer  Programming  Metrics 

Robert  William  Reiter,  Jr.,  Doctor  of  Philosophy,  1979 

Dissertation  directed  by:  Dr.  Victor  R.  Basili 

Associate  Professor 
Department  of  Computer  Science 

There  is  a  need  in  the  emerging  field  of  software  engineering 
for  empirical  study  of  software  development  approaches  and  software 
metrics.  An  experiment  has  been  conducted  to  compare  three 
programming  environments:  individual  programming  under  an  ad  hoc 
approach,  team  programming  under  an  ad  hoc  approach,  and  team 
programming  under  a  disciplined  methodology.  This  disciplined 
methodology  integrates  the  use  of  top-down  design,  process  design 
language,  structured  programming,  code  reading,  and  chief  program- 


ACKNOWLEDGMENTS 


This  work  was  supported  in  part  by  the  Air  Force  Office 
of  Scientific  Research  through  grant  AFOSR-77-3181A  to  the 
University  of  Maryland.  Computer  time  was  provided  in  part 
through  the  facilities  of  the  Computer  Science  Center  of  the 
University  of  Maryland. 

This  work  coulu  not  have  been  accomplished  without  the 
cooperation  and  assistance  of  others.  To  students  who 
participated  in  the  experiment,  colleagues  who  offered 
helpful  suggestions,  and  faculty  who  reviewed  the  work 
critically,  I  am  most  grateful.  Drs.  Richard  G.  Hamlet  and 
Ben  A.  Shneiderman  critiqued  this  manuscript  thoroughly  on 
the  basis  of  "programming  sense,"  experimental  procedure/ 
terminology,  and  writing  style.  Drs.  Marvin  V.  Zelkowitz 
and  John  D.  Gannon  imparted  a  healthy  sense  of  reality  and 
provided  an  appropriate  measure  of  stimulation/inspiration 
throughout  their  lengthy  service  as  members  of  my  study 
committee. 

I  am  indebted  beyond  measure,  however,  to  two  people 
whose  professional  contribution  and  personal  sacrifice  have 
continually  enriched  my  work  as  well  as  my  life.  I  thank  my 
advisor,  Dr.  Victor  R,  Basili,  for  his  expert  guidance  and 
patient  encouragement.  I  thank  my  wife,  Lowrie  Ebbert 
Reiter,  for  her  unselfish  support  and  unfailing  love. 


iii 


T  A  P  L  p  Of  CONTENTS 


Chduter 


I.  INTRODUCTION  AND  OVERVIE* .  1 

II.  BACKGROUND  AND  RELATED  RESEARCH  .  9 

Software  Development  Approaches  •••••••  9 

Software  Metrics  •  •  .  •  •  .  •  . . .  •  11 

E  m  o  i rica 1/ Experimental  Study  •••••••••  15 

III.  INVESTIGATION  SPECIFICS  . . 17 

Surroundings  .«•••••••••••••••  17 

Experimental  uesian  •••••••••••••  19 

Programming  Me t h oao l oa i e s  ••••••••••  24 

Data  Collection  and  Reduction  ••••••••  ?6 

Programming  Aspects  and  Metrics  •••••..  27 


I  4  .  GLOSSARY  Of  PROGRAMMING  ASPECTS 


w  .  DISCUSSION  OF  ELABORAT I  V  E  METRICS . 51 

Program  Changes  .••.•••••••••••  51 

Cyc'OMat'ic  Complexity  •  •  •  .  •  •  •  •  •  .  •  •  53 

Data  Bindings  •  •••••••••••••»•  58 

Software  Science  (quantities  •  •  _•  62 


VI.  INVESTIGATIVE  TECHNIQUE 


75 


Step  1;  Questions  of  Interest  ...••••» 

Step  2:  Research  Hypotheses  .•••••••• 

Step  3:  Statistical  Model  •••••••••• 

Step  4:  Statistical  Hypotheses . • 

Step  5:  Research  frameworks  •  •  •  •  .  •  •  •  • 

Step  6:  Experimental  Design  ..••••••• 

Step  7:  Collected  Data  •  •  •  •  •  •  .  •  •  •  •  • 

Step  6:  Statistical  Test  Procedures  •  •  •  •  • 

*  r  - 1;  9r  Statistical  Results  •  •  •  •  .  •  •  .  • 

Step  10:  Statistical  Conclusions  . 

St*;:  11:  Research  Interpretations  •  •  •  •  •  • 


76 

77 

78 
P0 
*2 
?  4 
*5 


95 

P  8 

92 


'3.; ’it'  T  ,  ,  ;  HE  SUITS 


94 


1  A  « 


f.'  station  «  *  .  •  .  •  «  •  • 

Imoact  Evaluation  •••••• 

A  1 o k e  d  Differentiation  View 
A  Directionless  Vie*  •  •  •  •  • 

Individual  highlights  •  •  •  • 


95 
95 
98 
101 
1 C  2 


INTERPRETIVE  RESULTS 


1C7 


According  to  basic  Suppositions  .......  1C7 

Accordinq  to  P r o q r amm i nq -A  so e c t  Classification 

. . .  . . .  113 

Miscellaneous.  ...  .  ......  1 2b 


SUMMARY  AND  CONCLUSIONS 


129 


Appe  na i x 


i.  Statistical  Description  of  Raw  Scores . 151 

References  •••••••••••••••.•••••••  157 


LIST  OF  TABLES 


Table 

1.  Programming  Aspects  . . 30a 

2.  Statistical  Conclusions  ..•••••••*••••  °5a 

3%  Statistical  Impact  Evaluation  •  97a 

4*1  Non-Null  Conclusions,  tor  Location  Comparisons, 

a rranqed  by  outcome  •  98a 

4,2  Non-Null  Conclusions!  for  Dispersion  Comoarisons, 

arranaed  by  outcome  •••••••••*»•••••  98b 


5.1  Relaxed  d i f f e r e n t i a t i on  for  Location  Comoarisons  •  1r0a 

5.2  Relaxed  o i f f e r e n t i a t i on  for  Dispersion  Comparisons 

. . . . IfOa 


Conclusions  for  Class  If 

Effort  (job  Steps)  •  •••••••».•*••••  114a 

6.2  Conclusions  for  Class  II, 

Errors  (P'roqram  Changes)  116a 

6.3  Conclusions  for  Class  III, 

Gross  Size  •  •»••••.••.»••••••»•  117a 

6*4  Conclusions  for  Class  IV, 

Con t rol -Const ruct  Structure  ••••••••••••  120a 

6*5  Conclusions  for  Class  V, 

Data  Variable  Organization  121a 

6.6  Conclusions  for  Class  VI, 

Packaging  Structure  ••.•••••••••••*•  122a 

6.7  Conclusions  for  £las$  VII, 

Invocation  Organization  .  ••••*•••  123a 

6.£  Conclusions  for  Class  VIII, 

Communication  via  Parameters  *•*•••••**•  1?4a 

6*9  Conclusions  for  Class  IX, 

Communication  via  Global  Variables  •#••••••  125a 


LIST  OF  FIGURES 

F  iau  re 

1,  Frequency  Distribution  of  Cyclomatic  ComDle*ity  •  •  58a 


2*  I nve s t i oa t i ve  Methodology  Schematic  •••••«••  26a 

3.1  Lattice  of  Possible  Directional  Outcomes 

for  Three-way  Comoarisun  •«••••••••..*  g 3a 

3.2  Lattice  of  Possible  Nond i rec t ional  Outcomes 

for  Three-way  Comparison  ....  ..••••••»  ?3a 


4.  Association  Chart  for  Results  and  Conclusions  ...  91a 


CHAPTER  I 


IL-ISQ&USIIQN  AN  2  CVER^I&S 


In  the  evolution  of  a  systematic  body  of  knowledge, 
tnere  are  Generally  three  phases  of  validation.  The  first 
phase  is  the  logical  development  of  the  theory  based  on  a 
set  of  sound  principles*  This  is  followed  by  the 
application  of  the  theory  and  the  gathering  of  evidence  that 
the  theory  is  applicable  in  practice*  This  usually  involves 
some  qualitative  assessment  in  the  form  of  case  studies. 

The  final  phase  is  the  empirical  ana  experimental  analysis 
of  the  applied  theory  in  order  to  further  understand  its 
effects  and  better  demonstrate  its  advantages  in  a 
controlled  manner.  This  usually  requires  quantitative 
measurement  of  the  relevant  phenomena* 

Much  has  been  written  about  methodologies  for 
devetooing  computer  software  Cwirth  71;  Dahl  *  Dijkstra  & 
woare  7Z;  Jackson  75;  Myers  75;  Linger*  Mills  ^  Witt  793* 
Most  of  these  methodologies  are  based  on  sound  logical 
principles*  Case  studies  have  been  conducted  to  demonstrate 
their  effectiveness  CBaker  75;  3asili  5  Turner  753*  Their 
adoption  within  production  ( Mr ea l -wo r l dM)  environments  has 
generally  been  successful*  having  practiced  adaptations  of 
these  methodologies*  software  designers  and  programmers  have 
asserted  that  they  got  the  job  done  faster*  made  fewer 
errors*  or  produced  a  better  product*  Unfortunately!  solid 
quantitative  evidence  that  c ompa r a t i ve l y  assesses  any 
particular  methodology  is  scarce  CSbneiderman  et  al.  77; 

y  e  r  s  763*  This  is  due  partially  to  the  cost  and 
imoract icality  of  a  va  l  id  experimental  setup  within  a 
oroauct ion  environment. 

Thus  the  question  remains*  are  measurable  benefits 
derived  from  programming  methodologies*  with  respect  to 
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'Miner  the  software  development  process  or  the  developea 
software  product**  Even  if  the  benefits  are  real,  it  is  not 
clear  that  they  can  be  quantified  and  effectively  monitored. 
Software  development  is  still  too  artistic,  in  the  aesthetic 
or  spontaneous  sense*  in  oroer  to  understand  it  more  fully, 
"anaoe  it  more  effectively*  and  adapt  it  to  particular 
applications  or  situations!  software  development  must  become 
more  scientific,  in  the  engineering  and  cal  c^Utea  sense. 
y o r e  empirical  study,  data  collection,  and  experimental 
analysis  ire  required  to  achieve  this  goal. 

This  dissertation  strives  to  contribute  to  software 
engineering  research  in  this  vital  thirj  phase  of 
validation.  The  dissertation  reports  on  an  original 
research  project  dealing  with  three  '’dimensions1'  of  software 
eng i neeri ng  : 

Software  development  approaches,  i.e.,  programming 
methodologies  and  environments  for  aeveloping  software; 

Software  metrics,  i.e.,  Quantifiable  aspects  of 
programming  and  measurements  of  software  characteristics; 

Empirical /experimental  study,  i.e*r  the  collection  and 
statistical  a  n* i  /sis  of  empirical  data  about  software 
Phenomena*,  including  controlled  psychological 
experimentation. 

The  immediate  godls  of  the  project  were 
fa)  to  investigate  the  effect  of  certain  programming 
methodologies  ana  environments  upon  software 
development  phenomena, 

(o)  to  investigate  the  behavior  of  certain  quantifiable 
programming  aspects  ana  software  measurements  under 
different  approaches  to  software  development,  ana 
(c)  to  devise  and  apoly  an  investigative  methodology, 
founded  on  established  crinciples  of  experimental 
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research,  Out  tailored  for  application  to  software 
engineering , 

The  project  employed  the  investigative  methodology  to 
conduct  and  analyze  a  controlled  experiment  with  software 
development  approaches  as  independent  variables  and  software 
metrics  as  dependent  variables*  In  this  way  f  both  the 
effect  of  the  software  development  approaches  and  the 
behavior  of  the  software  metrics  were  investigated 
scientifically* 

In  regard  to  software  development  approaches,  the 
project  Reused  on  three  distinct  approaches,  or  programming 
environments:  single  programmers  using  an  ad  hoc  approach, 
programming  teams  using  an  ad  hoc  approach,  and  programming 
teams  using  a  disciplined  methodology*  These  approaches  may 
be  c na rac t e r i zed  according  to  two  human-factors  issues:  the 
size  of  the  orog  ramming  “team"  aeployed  and  the  degree  of 
methodological  discipline  employed* 

in  terms  of  team  size,  individual  programmers  working 
alone  were  compared  to  teams  of  three  programmers  warning 
together.  In  terms  of  methodological  discipline,  an  <*d  hoc 
approach  allowing  programmers  to  develop  software  without 
externally  imposed  methodological  constraints  was  compared 
to  a  aisciolinea  methodology  obliging  programmers  to  follow 
certain  modern  programming  practices  ana  procedures*  This 
disciplined  methodology  consisted  of  an  integrated  set  of 
software  development  techniques  and  team  organizations 
including  top-down  design,  process  design  language, 
structured  programming,  code  reading,  and  chief  programmer 
teams* 

It  should  be  noted  that  the  terms  'methodology"  and 
"netnodoloyical'  (in  reference  to  software  development)  are 
usea  to  connote  an  integrated  set  of  development  techniques 
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as  -ell  as  team  organ i z at i ons v  rather  tnan  a  particular 
technique  or  organization  in  isolation*  Part  of  the 
philosophy  behind  the  project  is  the  belief  that f  while 
o articular  techniques  or  organisations  may  generate  marginal 
benefits  i  ndi  vi  d  ua  l  ly  ,  only  a  comprehensive  ensemble  can 
ensure  significant  gains  in  software  aeve lopment 
productivity  and  reliability* 

In  renard  to  software  metrics,  the  project  focused  on 
the  direct  qu an t  i  f i c a t i on  of  software  development  phenomena 
via  a  host  of  nearly  two  hundred  programming  aspects  ana 
measurements*  Attention  was  consciously  restricted  to 
metrics  exhibiting  certain  desirable  characteristics;  alt  of 
tne  software  metrics  examined  in  the  study  are  quantitative 
(on  at  least  an  interval  scale  LStevens  463),  objective 
(free  from  inaccuracy  due  to  human  subjectivity), 
unobtrusive  (to  those  developing  the  software),  and 
automatable  (not  dependent  on  human  agency  for  computation)* 

This  targe  set  of  programming  aspects  may  be 
ischotirr,  on  tne  oasis  of  other  criteria*  Some  of  the 

aspects  pertain  to  the  software  development  Q£0£ess ;  others, 
•  u  i’ he  developed  software  E£Oduct*  For  examole,  the  number 
of  times  that  source  code  modules  are  compiled  during  the 
development  period  is  a  process  measure,  white  the  number  of 
i F  statements  in  the  delivered  program  source  code  is  a 
product  measure.  Some  of  the  aspects  are  rucHmgntar^,  in 
that  they  pertain  to  very  simple  surface  features  or  lack 
theoretical  models  to  motivate  intuitive  appeal;  others  are 
*n  that  they  aim  at  more  complicated  underlying 
features  or  possess  provocative  theoretical  models.  For 
example,  the  measurements  mentioned  above  are  both 
rudimentary,  while  the  program  changes  metric  COunsmore  & 
Cannon  77]  and  the  cyclomatic  complexity  metric  CMcCaoe  76] 
are  elaborative* 
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In  regard  to  empirical/experimental  study*  the  project 
combined  both  empirical  data  collection  and  controlled 
psy c no l 03 i ca l  experimentation  in  a  l abo ra t o r y- l i k e  setting* 

The  project  involved  extensive  observation  of  forty- 
Mve  programmers  developing  working  software  systems* 
averaging  twelve  hundred  lines  of  code  each,  from  scratch 
during  a  five  week  period#  These  programmers  were  divided 
into  three  disjoint  grouos  of  “teams,”  each  following  one  of 
the  tnree  software  development  approaches  mentioned  above# 
vultiple  replications  of  a  specific  software  Development 
task  -ere  oerformed  indeoenoent t y  and  concurrently  within 
each  group  unuer  conditions  as  otherwise  identical  as 
00  s  s  i  c  l  e  • 

In  addition  to  some  subjective  qualitative  observation 
via  oue st i onna i r e s ,  interviews,  etc.,  objective  quantitative 
coservation  was  achieved  by  au t o ma t i c a l l y  and  unobtrusively 
monitoring  the  computer  activites  of  the  programming 
"teams.”  for  each  replication,  successive  versions  of  the 
software  oeing  developed  by  that  “team"  were  captured  in  an 
historical  data  bank  that  recorded  details  of  the 
development  process  and  oroouct.  Raw  scores  for  the 
$cft-are  metrics  mentioned  above  were  extracted  from  the 
data  oank  3nd  summarized  via  simple  descriptive  statistics. 
Specifically,  the  mean  values  and  standard  deviations 
reserved  within  each  group  on  the  various  quantifiable 
orogramming  aspects  constitute  the  immediate  results  of  the 
project  as  an  emoirical  data  collection  effort. 

The  project  followed  a  preplannd  experimental  aesign  in 
which  extraneous  factors  were  held  instant  wherever 
possible,  to  insure  that  differences  in  the  software  metrics 
would  be  attributable  to  the  different  software  development 
approaches.  The  metrics'  raw  scores  were  analyzed  using 
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noipa rame t ric  inferential  statistics  to  obtain  an  objective 
conclusion  for  each  measured  aspect*  As  precise  statements 
of  the  s  t  a  t  i  s  t  i  c  a  1 1  y  significant  differences  observed  among 
the  three  programming  environments  on  the  basis  of  the 
measured  aspects,  these  objective  conclusions  constitute  the 
immediate  results  of  the  project  as  a  controlled  experiment* 
cy  testing  for  differences  in  either  the  location  (expected 
value)  or  the  di^eersign  (  v  a  r  i  ab  i  l  i  t  y )  of  the  software 
’etricSf  the  experiment  addressed  both  the  expectancy  and 
oreoictabi  lity  of  software  development  phenomena* 

The  experiment  combined  elements  of  both  confirmatory 
*  id  exploratory  data  analysis*  Some  so-called  £QQiila>2iQ£I 
'r o g r a mmi ng  aspects  had  been  earmarked  as  promising 
indicators  of  important  software  c ha r a c t e r i s t i c s  in  advance 
of  conducting  the  experiment*  Hypotheses  had  been 
formulated,  on  the  oasis  of  the  programming  en vi ronmen t s * 
seeded  effects,  regarding  the  expected  objective 
occlusions  for  these  confirmatory  aspects*  The  project 
r.c  I  uoed  other  so-called  exgloratgr^  programming  aspects  in 
ontr  to  investigate  the  software  development  process  and 
- Cf  more  thoroughly. 

The  project  was  concerned  with  investigating  an  entire 
soft.<*re  Development  project  of  nontrivial  size  in  a  quasi- 
realistic  setting*  The  experiment  was  conducted  within  an 
academic  environment  in  a  laboratory  or  proving-grouna 
fashion  so  that  an  adequate  experimental  design  could  be 
achieved  while  simulating  a  production  environment*  In  this 
way*  the  project  reached  a  reasonable  compromise  between 
"to/"  experiments,  which  facilitate  elaoorate  experimental 
designs  but  often  suffer  from  a r t i f i c i a  l  i  t y  ,  and 
"production'*  experiments,  which  offer  industrial  realism  but 
^ncur  o ro h ib i t  i  v e l y  high  costs* 
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The  oroject's  oasic  premise  was  that  distinctions  among 
these  programming  environments  exist  Doth  in  the  process  and 
in  tne  product.  with  respect  to  the  developed  software 
oroauctf  the  aisciolined  team  should  approximate  the 
individual  programmer  or  at  least  lie  somewhere  between  the 
individual  programmer  and  the  ad  hoc  team,  with  regard  to 
oroauct  characteristics  (such  as  number  of  decisions  coded 
and  global  data  a  c  ce  s  s  i  Di  t  i  t  y)  •  This  is  because  the 
disciplined  methodology  should  help  the  team  act  as  a 
mentally  cohesive  unit  du r i ng  the  design,  coding,  and 
testing  phases.  with  respect  to  the  software  development 
process*  the  disciplined  team  should  have  advantages  over 
both  individuals  and  ad  hoc  teams*  displaying  superior 
performance  on  cost-related  factors  such  as  computer  usage 
and  number  of  errors  made.  This  is  because  of  the 
discipline  itself  and  because  of  the  ability  to  use  team 
memoers  as  resources  for  validation. 

The  study's  findings  revealed  several  programming 
characteristics  for  which  statistically  significant 
differences  do  exist  among  the  groups.  The  disciplined 
teams  used  fewer  computer  runs  and  apparently  made  fewer 
errors  during  software  development  than  either  the 
individual  programmers  or  the  ad  hoc  teams#  The  individual 
programmers  and  the  disciplined  teams  both  produced  software 
with  essentially  the  same  numDer  of  decision  statements,  but 
software  produced  by  the  ad  hoc  teams  contained  greater 
numoers  of  decision  statements.  For  no  c ha r ac t e r  i  s t i c  was 
it  concluded  that  the  disciplined  methodology  impaired  the 
effectiveness  of  a  programming  team  or  diminished  the 
duality  of  the  software  oroduct. 

The  remainder  of  this  dissertation  is  a  comprehensive 
reoort  on  the  software  engineering  research  project 
introduced  above.  Chapter  II  reviews  appropriate  background 
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and  related  research  fro*  published  literature#  Chapter  III 
recounts  specific  details  of  the  experiment  itself#  Chapter 
IV  oriefly  describes  all  of  the  programming  aspects  and 
measurements*  while  Chapter  V  discusses  the  elaborative  ones 
in  aepth#  Chapter  VI  depicts  the  investigative  methodology 
used  to  plant  execute,  and  analyze  the  experiment#  Chapters 
VII  and  VIII  present  the  experiment's  results,  segregated 
into  objective  findings  and  interpretative  discussion, 
respectively#  Chacter  IX  summarizes  the  completed  project, 
^raws  general  conclusions  regarding  its  contribution  to 
software  engineering,  and  mentions  possible  directions  for 
continued  research  in  this  area# 
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This  chapter  reviews  the  general  background  for  this 
research  project  and  surveys  related  work  published  in  the 
open  literature.  For  each  of  the  three  i#di  men  s  i  ons  **  of 
software  engineering  outlined  in  Chapter  I,  specific 
instances  of  research  in  that  area  will  be  mentioned  and 
loosely  characte  rized»  in  order  to  show  appropriate 
similarities  ana  constrasts  with  this  work.  As  a  catalog  of 
related  research,  the  chapter  is  intended  to  be  merely 
re o r e se nt a t i ve ,  not  exhaustive. 

0  g.ve  iQQ  Q!£Qi  Aggrga c he£ 

There  has  been  considerable  concern  regarding 
programming  methodologies  over  the  past  decade  since  the 
advent  of  structured  programming  and  the  dawning  of  software 
cost  consciousness.  Software  "practitioners"  (i.e.t 
programmers,  designers,  systems  analysts,  and  managers)  have 
sought  better  ways  to  channel  their  energies  toward 
producing  cos t -e f f ec t i v e»  reliable  software.  Althougn  a 
broad  spectrum  of  cone e rn $ --spanni ng  all  phases  of  the 
software  life-cycle  and  covering  the  full  range  of  system 
size  ana  performance  constraint— could  be  considered  here, 
attention  has  been  restricted  to  methodology  for 
programming- in-the-small*:  designing,  implementing,  and 
testing  computer  programs  to  solve  problems  small  enough  to 
be  well-understood  Dy  a  suitably  trained  individual.  In 
other  words,  the  focus  is  on  approaches  for  the  kind  of 
software  development  that  typical  programmers/analyst s  in 
t/oical  software  shops  are  accustomed  to  doing. 


*  used  here  (and  below),  the  meanings  of  the  terms 
'program* in g-in-the-$mall'  and  'programming -in-the-large' 
are  clear  from  the  context*  but  they  differ  slightly  from 
the  meanings  popularized  by  Dr.  H.D.  M  i  L  L  s  • 
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A  number  of  good  ideas  on  hoy  to  develop  software* 
covering  techniques  for  how  to  proceed  as  well  as 
or g d n i z a t i ons  for  managing  people  and  connun icat ing 
information,  have  been  (or  are  being)  devised*  demons t rat ed * 
perfected,  anc  accepted  into  everday  practice*  Popular 
examples  include  the  following: 

structured  o rog r a m mi n g  [Dahl,  Dijkstra  Z  Ho are  72; 

"ills  72;  Oasili  &  Baker  77;  Linger,  Mills  s  Witt 
793, 

stepwise  refinement  Cwirth  71], 

chief  programmer  teams  [Baker  72;  Baker  75;  Srooxs  753, 
process  design  language  C°DL)  [Linger,  Mills  &  Witt 
793  , 

top-down  design* 
functional  expansion, 

aesign/code  reading  and  walk-throughs  [Fagan  763, 
aata  aos t ra c t ion/encap sulat ion  and  information  hiding, 
iterative  enhancement  [Basili  &  Turner  75;  Turner  763* 
the  Michael  Jackson  method  [Jackson  75;  Hughes  7V3,  and 
composite  design  C Myers  753# 

These  approaches  and  their  highly  touted  benefits  have 
oeen  the  subject  o4  much  written  promotion  and  verbal 
discussion*  Indeed,  several  can  boast  of  mathematical 
foundations  or  formal  explication  to  support  their 
underlying  principles  or  mechanisms;  for  others,  there  are 
extensive  tutorials  on  how  to  apply  them  in  practical 
situations;  and  some  have  been  embodied  in  programming 
languages  or  packaged  into  automated  tools*  All  of  this 
attention,  plus  the  favorable  experiences  of  software 
practitioners,  seems  to  indicate  that  these  software 
development  approaches  do  succeed  in  improving  the 
efficiency  of  the  development  process  or  the  quality  of  the 
developed  product  to  some  degree* 
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out  there  is  tittle  empirical  evidence  to  confirm  the 
advantages  of  these  approaches  or  measure  their  benefits. 

In  several  instances*  case  studies  have  been  performed* 
often  in  a  pioneering  spirit*  to  demonstrate  particular 
approaches;  these  case  studies  have  usually  involved 
qualitative  assessment*  with  only  limited  or  uncontrolled 
forms  of  quantitative  assessment*  Comparative  assessment  of 
software  development  approaches  is  even  rarer:  only  a  few 
controlled  experiments  CShneiderman  et  al*  7?;  Myers  7  S  J 
have  oeen  conducted*  and  they  have  generally  focused  on  the 
us?  ot  particular  techniques  in  isolation*  The  difficulty 
of  investigating  the  effects  of  software  development 
approaches  stems  precisely  from  the  fact  that  they  pertain 
to  tne  least  understood  and  most  expensive  elements  in 
software  engineering:  human  beings* 

So f t w$r £  Metrics 


There  has  been  considerable  interest  in  software 
metrics  over  the  past  half  decade  in  response  to  a  growing 
realization  of  how  ''invisible*'1  imponderable*  and 
uncontrol labte  software  can  be*  Software  "scientists"  have 
oeen  seeking  ways  to  measure  software  phenomena*  9roadly 
interpreted*  tneir  efforts  may  be  characterized  as 
attempting  to  Quantify  process  efficiency  and  product 
quality**  The  software  measurement  domain  extends  from  the 
concrete  details  of  a  program*  including  its  fine  structure 
a n j  the  resource  expenditure  required  to  produce  it*  to  its 
aostract  characteristics:  reliability*  cost-effectiveness* 


♦  This  concept  of  product  quality  is  meant  to  include 
’M  jntaneousi  as  well  as  evolut  ionary*  cons  ioerations*  The 
former  considera tions  pertain  to  both  static  (at  compile 
* i * e )  and  dynamic  (at  execution  time)  features  of  a  program, 
as  it  exists  at  a  given  ooint  along  its  life-cycle*  The 
(pr  considerations  pertain  to  issues  of  software 
mdintenance  and  software  management  throughout  the  life- 
'  r l v •  The  software  measures  in  this  dissertation  adaress 
product  quality  only  in  its  instantaneous*  static  sense* 
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complexity*  modularity,  c om p re h e ns i 0  i  t y  ,  modifiability,  etc. 

because  measurement  is  essential  to  most  forms  of 
engineering,  software  metrics  rightfully  deserve  a  central 
dace  within  the  emerging  discipline  of  soft  -are 
engineering.  As  in  other  technologies,  the  underlying 
assumption  is  that  appropriate  measurement  is  the  key  to 
effective  control.  It  has  Deen  demonstrated  CGilb  77}  that 
the  general  concept  of  software  measurement  can  be  applied 
to  a  variety  of  programming  issues:  many  interesting 
suggestions  were  made  regarding  how  and  why  to  measure 
software.  But  the  metrics  discussed  by  Gilb  are  vaguely 
defined  and  superficial.  The  problem  is  that  meaningful 
measurement  of  software  is  extremely  difficult,  because  of 
software's  intricate  structure  of  concrete  detail  and 
because  of  the  tenuous  relationship  between  its  concrete 
details  and  abstract  characteristics.  An  additional  problem 
is  the  lack  of  well-understood  and  commonly  accepted 
terminology  to  describe  the  software  phenomena  to  be 
measured. 

Ho-ever,  a  numoer  of  well-defined  and  fairly  credible 
software  metrics  have  been  proposed  and  evaluated,  usually 
in  conjunction  with  a  motivating  model  or  some  intuitional 
underpinnings.  The  program  changes  metric  [Dunsmore  i 
Cannon  77;  Dunsmore  7s]  extracts  an  error  count 
algorithmically  from  the  textual  revisions  made  to  source 
code  during  program  development.  The  cyclomatic  complexity 
metric  C^cCabe  763  counts  the  number  of  "basic"  control-flow 
paths  in  a  program.  The  data  bindings  metric  [Stevens, 

My?rs  &  Constantine  74;  Basili  &  Turner  75;  Turner  763 
counts  commmuni c at  ion  paths  between  code  segments  via  data 
variables.  The  various  metrics  from  software  science  theory 
rn*,lstead  77]--program  length,  program  volume,  language 
level,  effort,  e  tc.  — provide  a  unified  system  of 
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measurements  for  the  size  of  a  program*  the  amount  of 
information  it  contains*  the  level  of  abstraction  it 
exoresses*  the  amount  of  mental  effort  required  to  produce 
or  comprehend  it*  etc#  The  error-day  metric  Chills  763  is 
an  inoex  of  how  early  errors  are  detected  and  corrected 
durin^  software  development#  The  span  metric  CElshoff  76o3 
is  an  inoex  of  the  extent  to  which  a  program's  data 
variables  remain  "live”  (i#e#*  continue  to  affect  control 
flow  and  data  value  determination)  • 

Each  metric  mentioned  above  has  been  examined 
empirically  to  one  degree  or  another;  but  few  software 
metrics  have  Deen  investigated  in  controlled  experiments* 
and  there  is  little  research  comparing  metrics  or  examining 
their  i n t e r re  la t i onsh i p s  empirically#  Further  elaboration 
and  discussion  of  individual  software  metrics  is  deferred  to 
Chapters  lv  and  V  since  many  were  examined  in  this  reseach 
oroject# 

ID  2  111  £  ii  £  ££ £1  S£ud  * 

First-hand  observation  of  software  ohenomena  in  the 
"wild*"  so  to  speak*  has  long  been  regarded  as  a  unique 
source  of  information  and  the  ultimate  form  of  validation. 
Ever  since  Knuth  rummaged  through  wastebaskets  at  computer 
centers  for  discarded  listings  of  Fortran  programs  [Knuth 
713*  software  "technicians"  have  been  interested  in  watching 
software  be  developed*  to  see  how  the  latest  intuitive 
opinions  or  theoretical  models  fare  against  reality# 

Ideally*  it  is  useful  to  distinguish  between  data  collection 
efforts  (with  descriptive  statistical  analyses)  and 
controlled  e*Pe*“  i  ment  at  io  n  efforts  (with  inferential 
statistical  analyses);  but*  in  practice*  elements  of  both 
are  sometimes  combined  within  the  same  empirical  study# 
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oenerally  sneaking,  the  purpose  of  data  collection 
efforts  has  been  to  examine  the  behavior  of  software  metrics 
anj  models  unaer  realistic  conditions.  A  pumper  of  data 
collection  efforts  have  seen  aimed  at  p roga mm i ng - i n -t h e - 
large,*  focusing  on  models  of  gross  behavior  (i.e.,  cost, 
oroauc t iv i t y ,  resource  estimation)  during  large-  to  medium- 
scale  software  development*  At  IBM  [wa Iston  &  Felix  77] 

.ata  was  collected  via  project  reporting  forms  in  order  t0 
■reasure  productivity  on  production  software  developments* 

At  NASA/Goddard  [3asili  et  al.  77]  data  is  being  collected 
via  information  forms  in  order  to  evaluate  cost  or  resource 
pst  ifrotion  models  and  to  study  software  error  phenomena. 

Other  data  collection  efforts,  focusing  on  small-  to 
medium-scale  sof tware  development,  have  been  aimed  at 
Quantitatively  c harac t e ri z i ng  software's  fine  structure*  In 
studies  at  G*  CElshoff  76b;  Elshoff  76a],  a  large  set  of 
commercial  PL/'T  programs  was  collected  and  measured 
according  to  a  host  of  quantifiable  programming  aspects  and 
software  metrics,  including  the  span  metric  and  the  software 
ecience  metrics. 

Generally  speaking,  the  purpose  of  controlled 
e x o e r i men t a t i on  efforts  has  been  to  evaluate  the  effects  of 
programming  language  features,  human  factors  issues,  and 
programming  methodologies  upon  software  phenomena  and 
acstract  characteristics.  Usually,  the  language  features 
experiments  are  done  from  a  computer  scientists  viewpoint, 
while  the  human  factors  experiments  are  done  from  a 
psychologist's  viewpoint.  However,  because  of  areas  of 
natural  overlap  between  these  two  concerns,  some 
experiments  fall  into  both  categories.  Together  they 
comprise  the  bulk  of  controlled  e*pe r i men ta t ion  in  software 


♦  See  earlier  footnote. 
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engineering. 

There  are  several  well-known  examples  of  controlleo 
e*  oe  r  i  men  t  a  t  i  on  on  programming  language  features.  ijeissman 
[weissman  74a;  weissman  74b]  conducted  experiments  on  how 
programming  features  affect  the  psychological  complexity  of 
software;  the  features  included  commenting!  indentation, 
mnemonic  variable  name's,  ana  control  structures.  Gannon 
CGannon  75;  Gannon  £  Horning  75]  conducted  an  experiment  on 
how  programming  language  features  affect  software 
reliaoility  and  the  presence/persistence  of  errors;  the 
features  included  statement  vs.  expression  orientation,  data 
variaole  scope  conventions,  and  expression  evaluation  order. 
Later,  Gannon  [Gannon  77]  ran  experiments  to  examine  how 
data  typing  conventions  affect  software  reliability.  Using 
the  same  empirical  data,  Dunsmore  [Dunsmore  3  Gannon  77; 
Dunsmore  73]  examined  how  programming  M complexity M  is 
atfectea  by  prog  rammer-controllable  variations  in 
programming  features.  "C omp  le x i t y M  was  measured 
a l g o r i t hm i ca l l y  by  the  program  changes  metric;  the  features 
included  statement  nesting  aepth,  frequency  of  data 
references,  and  data  communication  mechanism  preference. 

There  are  several  well-known  examples  of  controlled 
e x o e r i m en t a t i on  on  human  factors  issues.  Several 
^xoeriments  [Sine,  Green  &  Guest  73;  Green  771  have  been 
conducted  on  the  c omo r e he n s  i  fc i l i t y  of  different  mechanisms 
for  implementing  conditional  oranching.  Several  experiments 
[Shecpard  et  al.  79]  have  been  conducted  on  the  effect  of 
modern  coding  practices,  such  as  structured  coding,  mnemonic 
variable  names,  and  style  of  commenting  upon  the  ease  of 
performing  comprehension,  modification,  and  debugging  tasks. 

finally,  there  are  a  few  well-known  examples  of 
controlled  e x pe r  i  men t a t i on  on  programming  methodologies. 
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Several  e x pe r i me n t s  were  conducted  [Shneiderman  et  al.  77] 
to  evaluate  the  utility  of  detailed  flowcharting  (as  a 
design  tool  and  documentation  aid)  in  program  composition, 
comprehension,  debugging,  and  modification  tasks;  novice 
progamming  students  were  employed  as  subjects,  with  short 
(i.e.,  less  than  ISO  lines)  Fortran  programs  as  test 
materials*  Some  experiments  were  also  conducted  layers  7S] 
to  evaluate  the  utility  of  code  reading  and  walkthroughs  in 
debugging  tasks;  experienced  professional  programmers  were 
employed  as  subjects,  with  a  short  PL/1  program  as  test 
material.  To  date,  however,  controlled  experimentation  on 
programming  methodologies  has  been  limited  in  scope. 
Experimental  studies  have  not  involved  programming 
activities  spanning  multiple  phases  of  the  software  life- 
cycle  and  requiring  the  natural  integration  of  multiple 
programming  tasks.  Nor  have  experimental  studies  useo 
nontrivial  test  materials  requiring  sustained  effort  lasting 
several  weeks  and  involving  several  hunared  lines  of  code. 
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This  chapter  outlines  the  surroundings  in  which  the 
exoeriment  was  conducted,  the  experimental  design  that  was 
employed,  the  programming  methodologies  that  were  compared# 
the  data  collection  and  reduction  that  was  performed,  and 
the  programming  aspects  that  were  measured* 

u.2  C  £2  2GU1G  ui 


j»ver4  circumstances  surrounding  the  experiment 
c  o  •■»  i  r  i  b  u  t  e  significantly  to  the  context  in  which  its  results 
must  le  appraised#  These  include  the  setting  in  which  the^ 
exoeriment  was  conducted,  the  people  who  participated  as  * 
suojects#  the  software  development  project  that  served  as 
the  experimental  task#  the  computer  programming  language  in 
which  the  software  was  written#  and  the  computer  system  and 
access  mode  that  were  used  during  development* 

The  experiment  was  conducted  during  the  Spring  1976 
semester,  January  through  May#  within  regular  academic 
courses  given  by  the  Department  of  Computer  Science  on  the 
Cotle^e  Park  campus  of  the  university  of  Maryland.  Two 
comparable  advanced  elective  courses  were  utilized#  each 
with  the  same  academic  prerequisites*  The  experimental  task 
and  treatments  were  built  into  the  course  material  ana 
assignments.  Everyone  in  the  two  classes  participated  in 
the  experiment;  they  cooperated  willingly  and  were  aware  of 
being  monitored#  Put  had  no  knowledge  of  what  was  being 
observed  or  why* 

The  participants  were  advanced  undergraduate  and 
nraduate  students  in  the  Department  of  Computer  Science.  On 
tne  whole,  they  were  reasonably  competent  computer 
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programmers*  all  having  completed  at  least  four  semesters  of 
programming  course  work  and  some  having  as  much  as  three 
years'"  professional  programming  experience  in  government  or 
industry.  Generally  speaking*  they  were  familiar  with  the 
implementation  language  and  the  host  computer  system*  but 
i ne x oe r  ie n ced  in  team  programming  and  the  disciplined 
methodology. 

The  programming  application  was  a  simple  compiler* 
involving  string  processing  and  translation  (via  scanning, 
parsing*  code  generation*  ana  symbol  table  management)  from 
an  Algol-like  language  to  zero-address  code  for  a 
hyoothetical  stack  machine.  The  total  task  was  to  design, 
inolement*  test*  and  debug  the  complete  computer  software 
system  from  given  specifications.  The  scope  of  the  project 
excluded  both  extensive  error  handling  and  user 
documentation.  The  project  was  of  modest  but  nonneg l i g i b l e 
difficulty*  requiring  between  one  and  two  man-months  of 
effort.  The  size  of  the  resulting  systems  averaged  over 
1Z0O  lines  of  high-level  language  source  code.  All  facets 
of  the  project  itself  were  fixed  and  uniform  across  all 
development  ’’teams.11  Given  the  same  specifications* 
computer  resource  allocation*  calendar  time  allotment*  host 
machine,  implementation  language*  debugging  tools*  etc.* 
each  "team”  worked  independently  to  build  its  own  system. 

The  delivered  systems  each  ran  (i.e.*  they  worked)  and 
passea  an  independent  acceptance  test. 

The  implementation  language  was  the  high-level* 
s t r u c t u re d-prog r ammi ng  language  S]MPl-T  CBasili  &  Turner 
3#  This  language  was  designed  and  developed  at  the 
University  of  Maryland  where  it  is  taught  and  used 
extensively  in  regular  Department  of  Computer  Science 
courses.  SIMPl-T  contains  the  following  control  constructs: 
sequence*  ifthen*  ifthenelse*  whiledo*  case,  exit  from  loop* 
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and  return  from  routine  (but  no  goto).  SIMPl-T  allows 
essentially  three  levels  of  data  declaration  scope  (i.e** 
local  to  an  individual  routine*  global  across  the  several 
routines  of  an  individual  module*  or  entry-global  across  the 
routines  of  several  modules)*  but  routines  may  not  be 
nesteu.  Adhering  to  a  philosophy  of  "strong  typing***  the 

supports  integer*  character*  and  string  data  types 
and  single  dimension  array  data  structures*  It  provides  the 
programmer  with  automatic  recursion  and  PL/1-like  string- 
processing  capabilities.  (Additional  details  regarding  the 
S1YPL-T  programming  language  are  interspersed  among  the 
explanatory  notes  in  Chapter  IV.) 

The  host  computer  system  was  the  campus-wide  computing 
facility,  a  Univac  1100  machine  with  the  usual  Exec  8 
operating  system.  This  system  supports*  in  its  fashion, 
both  catch  access  (via  punch  cards)  and  interactive  time¬ 
sharing*  access  (via  TTY  or  CRT  terminals).  The 
participants  were  well  acquainted  with  the  system  and 
accustomed  to  either  access  mode*  During  the  experiment* 
the  participants  were  allowed  to  choose  whichever  access 
mode  they  preferred  and  could  switch  freely  between  modes. 
Almost  everyone  consistently  preferred  the  interactive 
access  mode;  only  one  person — in  the  AI  group  (see  below), 
by  the  way — used  the  batch  access  mode  extensively. 

Li 2  £  £ 10£0 1 £££220 

The  major  elements  of  an  experimental  design  are  its 
units,  treatment  factors*  treatment  factor  levels*  observed 
variables,  local  control*  and  management  of  extraneous 
factors*  (Cf*  COstle  and  Sensing  75*  chap.  °]  for  a  general 
treatment  of  these  elements*) 


*  Called  ''demand**  in  Univac  terminology 
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An  experimental  unit  s  that  object  to  which  a  single 
treatment  is  applied  in  one  replication  of  the  event  Known 
as  the  "basic  experiinent.M  In  this  study,  the  "basic 
exoerinent"  was  the  accomplishment  of  a  specific  software 
development  project  (see  above),  ana  the  experimental  unit 
was  the  software  development  team  (i*e»,  a  small  group  of 
people  working  together  to  develop  the  software)#  A  total 
of  19  replications  of  this  Mbasic  experiment  each 
performed  concurrently  and  i ndepende nt l y  by  a  separate 
experimental  unit,  were  involved  in  this  experiment* 

Most  experiments  are  concerned  with  on  one  or  more 
independent  variables  and  the  behavior  of  a  one  or  more 
deoendent  variaoles  as  the  independent  variables  are 
permitted  to  vary*  These  independent  variables  are  known  as 
experimental  treatment  factors*  This  experiment  focused  on 
the  approach  used  to  develop  software,  as  the  single 
experimental  treatment  factor. 

Experiments  usually  involve  some  deliberate  variation 
in  the  experimental  treatment  factor(s)*  Different  values 
or  classifications  of  the  f actor  (s)  are  known  as  the 
experimental  treatment  factor  levels*  In  this  experiment, 
three  levels  were  selected  for  the  software  development 
approach  factor.  Conceived  as  variations  in  two  human- 
fac  tors-in-programming  issues,  size  of  aevelopment  "team” 
and  degree  of  methodological  discipline,  the  experimental 
treatment  factor  levels  are  denoted  by  the  following 
nine  mon  i  cs  : 

A I  —  individual  programmers  working  alone,  following 
an  ad  hoc  approach  (see  below); 

at  --  teams  of  three  programmers  working  together, 
following  an  ad  hoc  approach  (see  below);  and 

DT  --  teams  of  three  programmers  working  together, 

following  a  disciplined  methodology  (see  below)* 
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during  an  experiment,  observations  of  the  dependent 
varidLlt(s)  are  made  for  each  experimental  unit.  An 
exaer i^ent's  immediate  objective  is  to  ascertain  the 
relationship  between  the  experimental  treatment  factor 
levels  and  the  experimental  observed  variables.  In  tnis 
experiment*  the  observed  variables  were  quantifiable 
programming  aspects*  or  metrics  (see  below),  of  tne  software 
development  process  or  the  developed  software  product.  A 
large  set  of  such  aspects  were  considered  in  the  study. 
Technically  speaking,  this  amounted  to  conducting  a  series 
of  simultaneous  univariate  experiments*  one  for  each 
programming  aspect*  all  sharing  a  common  experimental  design 
and  all  based  on  the  same  empirical  data  sample. 

Experimental  local  control  addresses  the  configuration 
by  which  (a)  experimental  units  are  obtained*  (b)  units  are 
placec  into  groups,  and  (c)  groups  are  subjected  to 
different  experimental  treatments  (i.e.,  specific 
combinations  of  experimental  treatment  factor  levels). 

Local  control  is  employed  in  the  design  of  an  experiment  in 
order  to  increase  its  statistical  efficiency  or  to  improve 
the  sensitivity/oower  of  statistical  test  procedures. 
Experimental  local  control  usually  incorporates  some  form  of 
random  wat  ion  — a  basic  principle  of  experimental  design  — 
since  it  is  necessary  for  the  validity  of  statistical  test 
orocedures* 

for  this  experiment,  subjects  were  obtained  on  the 
hasis  of  course  enrollment:  since  the  experiment  was 
emoedded  within  two  academic  courses*  every  student  enrolled 
in  those  courses  aut oma t i ca  l  ly  participated  in  the 
experiment.  Software  development  "teams1*  were  formed  among 
these  subjects.  In  the  one  course*  the  students  were 
allowed  to  choose  between  segregating  themselves  as 
individual  programmers  or  combining  with  two  other 
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classmates  as  three-person  programming  teams.  In  the  other 
coarse  ,  the  students  were  assigned  (by  the  researcher)  into 
three-person  teams.  The  two  academic  courses  themselves 
provided  the  variation  in  methodological  discipline.  The 
atmosphere  of  the  first  course  was  conducive  to  an  ad  hoc 
aporoach  to  orogramming*  while  the  disciplined  methodology 
was  stressed  in  the  second  course.  In  this  manner*  three 
experimental  treatments  ( corresponding  to  the  three 
experimental  treatment  factor  levels  AI*  AT*  and  Dt)  were 
created*  and  three  groups  of  6*  6*  and  7  units 
(respectively)  were  exposed  to  them. 

There  are  usually  several  extraneous  factors*  other 
than  the  ones  identified  as  experimental  treatment  factors* 
that  could  influence  the  behavior  being  observed  in  an 
experiment.  Many  experiments  (includings  this  one)  follow  a 
reductionist  paradigm*  which  seeks  to  control  for  all 
variaoles  except  a  select  few*  so  that  the  effect  of  the 
independent  variables  upon  the  dependent  variables  can  be 
isolated  and  measured.  In  this  experiment*  a  variety  of 
prpgramming  factors  which  do  affect  software  development 
were  given  conscious  consideration  as  extraneous  variables: 

-  programming  application  and/or  project 

-  project  speci f i cat  ions 

-  imp  lement at  ion  language 

-cal enda  r  schedu  l  e 

-  available  computer  resources 

-  available  automated  tools 

wherever  possible*  these  variables  were  held  constant  by 
explicitly  treating  all  experimental  units  in  the  same 
manner. 

unfortunate ly,  the  ideal  reductionist  paradigm  can  only 
be  approximated,  because  of  factors  which  are  suspected  of 
strong  influence  on  the  behavior  of  interest*  but  which 
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Cdinot  oe  exolicitly  controlled  within  the  e  *per imentdl 
design.  In  this  experiment,  there  were  two  such  factors: 
the  personal  ab i l i t y / e x pe r i e nc e  of  the  participants  and  the 
3HDunt  of  actual  time/effort  they  (as  students  with  other 
classes  and  r e s p ons i b i l i t i e s  )  chose  to  aevote  to  the 
project#  However,  information  from  a  pretest  qu e s t i onna i r e 
was  used  to  balance  the  personal  a b i l i t y / e x p e r i e n c e  of  the 
participants  in  the  disciplined  teams  (only),  by  first 
partitioning  the  group  Dt  students  into  three  equal-sized 
categories  <h  gn,  medium,  low)  oasea  on  their  grades  in 
previous  computer  courses  and  their  extracurricular 
programming  experience,  and  then  randomly  selecting  one 
stuaent  from  each  category  to  form  each  team. 

for  t.ie  statistical  mocel  employed  to  analyze  this 
experiment,  it  was  necessary  to  assume  homogeneity  among  the 
Participants  with  respect  to  personal  factors  such  as 
aoility  and/or  experience,  motivation,  time  and/or  effort 
devoted  to  the  project,  etc.  As  a  reasonable  measure  of 
individual  programmer  skill  levels  under  the  c i r c um s t a n c e s 
of  this  study,  the  participants"  grades  from  a  particularly 
pertinent  prerequisite  course  proviaed  o  post-experimental 
coif i rmat ion  of  at  least  one  facet  of  this  assumed 
homogeneity:  the  distrioution  of  these  grades  among  tne 
three  experimental  groups  would  have  displayed  the  same 
degree  of  homogeneity  as  was  actually  ooserved  in  over  9  out 
of  1?  purely  random  assignments  of  the  participants  to  the 
groups.  If  anything,  in  the  researcher's  opinion,  the 
participants  in  group  Al  seemed  to  have  a  slight  edge  over 
those  in  groups  AT  and  D T  with  respect  to  native  programming 
aoility,  while  groups  AI  and  AT  seemed  slightly  favored  over 
crouc  Z T  with  respect  to  formal  training  in  the  apolication 
area. 
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The  disciplined  methodology  imposed  on  teams  in  group 
0 T  consisted  of  an  integrated  set  of  state-of-the-art 
techniques,  including  too-down  design,  process  design 
language  (pDL),  functional  expansion,  design  and  code 
reading,  walk-throughs,  and  chief  programmer  team 
organization.  These  techniques  and  organizations  were 
taught  as  an  integral  part  of  the  course  that  the  subjects 
were  taking,  using  [Linger,  M i l l s  5  Witt  70],  Ceasili  & 

°d ker  75] ,  and  [Brooks  75]  as  textbooks.  Since  the  subjects 
were  novices  in  the  methodology,  they  executed  it  to  varying 
degrees  of  thoroughness  and  were  not  always  as  successful  as 
seasoned  users  of  the  methodology  would  be. 

The  disciplined  methodology  prescribed  the  use  of  a  PDL 
♦or  exoressing  the  design  of  the  problem  solution*  The 
design  was  elaborated  in  a  top-down  manner,  each  level 
representing  a  solution  to  the  problem  at  a  particular  level 
ot  abstraction  and  specifying  the  functions  to  be  expanded 
at  the  next  level.  The  PDL  consisted  of  a  fixed  set  of 
structured  control  and  data  structures,  plus  an  open-enaed 
designer-defined  set  of  operators  and  operands  corresponding 
to  the  level  of  the  solution  and  the  particular  application. 
Design  and  code  reading  involved  the  critical  review  of  each 
team  member's  PDL  or  code  by  at  least  one  other  member  of 
the  team.  walk-throughs  represented  a  more  formalized 
presentation  of  an  individual's  work  to  the  other  members  of 
the  team  in  which  the  PDL  or  code  was  explained  step  oy 
step.  under  the  chief  programmer  team  organ  izat ion ,  the 
chief  programmer  defined  the  top-level  solution  to  the 
proolem  in  PDL?  designed  and  implemented  key  portions  ot 
code  himself,  and  assigned  subtasks  to  the  other  two  team 
nemoers.  Each  of  these  orogrammers,  in  turn,  code-read  for 
the  chief  programmer,  designed  or  coded  their  assigneu 
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suopieces,  and  performed  liorarian  activities  (i.e#, 
entering  or  revising  code  stored  on-line,  making  test  runs, 
etc*). 

Two  variants  of  chief  programmer  team  organization, 
denoted  CP  and  M*  were  employed#  In  both  cases,  one  member 
of  the  team  (the  chief  programmer  or  the  manager)  was 
responsible  for  designing  and  refining  the  top-level 
solution  to  the  proolem  in  PDL,  identifying  system 
components  to  be  implemented,  and  defining  their  interfaces. 
rne  t*o  other  team  members  (the  programmers)  were  each 
responsible  for  designing  or  coding  various  system 
components,  as  assigned  by  the  chief  programmer  or  manager# 
in  the  CP  case,  the  chief  programmer  maximized  his  cooing 
duties  oy  implementing  the  key  code  himself,  and  the 
programmers  performed  librarian  activities  (i*e.,  entering 
or  revising  code  stored  on-line,  making  test  runs,  etc#)# 

In  the  M  case,  the  manager  minimized  his  coding  duties  by 
acting  as  librarian  and  yielding  greater  responsibility  for 
i mo l e me nt a t i on  to  the  programmers#  Although  there  were 
(supposedly)  four  CP  teams  and  three  M  teams  in  group  DT, 
this  distinction  between  the  CP  and  M  variants  of  chief 
programmer  team  organization  is  not  utilized  in  the  present 
study,  since  it  is  believed  that  the  impact  of  their  common 
features  transcends  any  impact  due  to  their  differences# 
Moreover,  in  actual  practice,  it  was  observed  that  the  CP 
and  M  variants  are  only  identifiable  extrema  along  a 
continuum  and  that  the  group  D T  teams  all  gravitated  toward 
a  comfortable  compromise  in  this  respect# 

Each  individual  or  team  in  groups  Al  or  AT  was  allowed 
to  develop  the  software  entirely  in  a  manner  of  his  or  their 
own  choosing,  which  is  herein  referred  to  as  an  ad  hoc 
approach#  No  methodology  was  taught  in  the  course  these 
suDjects  were  taking#  Informal  observation  by  the 
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researcher  confirmed  that  approaches  used  by  the  individuals 
and  ad  hoc  teams  were  indeeo  lacking  in  discipline  ana  aia 
not  utilize  the  key  elements  of  the  disciplined  methodology 
(e*^*,  an  individual  working  alone  cannot  practice  coae 
reading,  and  it  was  evident  that  the  ad  hoc  teams  did  not 
employ  a  pDL  or  a  formal  top-down  design). 

2.a t  i  £2 ii £ £li2Q  £Qd  SfidytliCD 

Due  to  the  Partially  exploratory  nature  of  the 
pxDer  ifnent  in  terms  of  differences  to  be  discovered  in  the 
project  and  process,  as  much  information  was  col  lecteo  a$ 
could  be  done  in  an  efficient  and  unootrusi  ve  manner.  A 
variety  of  information  sources  was  used.  Individual 
questionnaires  revealed  the  personal  background  and 
programming  experience  of  each  participant*  Private  team 
interviews  and  in-class  team  reports  provided  information 
regarding  individual  performance  on  the  project.  "Run  logs'* 
and  computer  account  billing  reports  gave  a  record  of  the 
computer  activity  during  the  project*  Special  module 
compilation  and  program  execution  processors  (invoked  on¬ 
line  via  very  slight  changes  to  the  regular  command 
longuage)  created  an  historical  data  bank  of  source  code  and 
test  oata  accumulated  throughout  the  project  development. 

The  data  bank  provided  the  principal  source  of 
information  analyzed  in  the  current  i nv e s t i g at i on  and  other 
information  sources  have  been  utilized  only  in  an  auxiliary 
manner  (if  at  all).  Thus,  data  collection  for  the 
experiments  themselves  was  automated  on-line,  with 
essentially  no  interference  to  the  programmer's  normal 
cattprn  of  actions  during  computer  (terminal)  sessions*  The 
final  products  were  isolated  from  the  data  bank  and  measured 
for  various  syntactic  and  0 r gan i za t i ona  l  aspects  of  the 
finished  product  source  code*  Effortand  cost  data  were 
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also  extracted  from  the  data  bank*  The  inputs  to  the 
analysis*  in  the  form  of  scores  for  the  various  programming 
aspects*  reflect  the  q u an t i t a t i v e l y  measured  character  of 
the  crouuct  and  the  process*  Much  of  the  data  reduction  was 
done  automatically  within  a  specially  instrumented  compiler* 
Some  was  done  manually  (e*g**  examining  characteristics 
across  modules)*  Due  to  the  underlying  collection  ana 
reduction  mechanism,  which  was  uniformally  applied  to  all 
exoerimental  units*  the  data  used  in  the  analysis  has  the 
characteristics  of  objectivity*  uniformity,  and 
quantitativeness  and  is  measured  on  an  interval  scale  of 
measurement  [Stevens  46J*  The  raw  scores  for  the  measured 
programming  aspects  are  summarized  in  Apppendix  1* 

o g  r a  m m £ t s  aQd  2fil£i£§, 

The  dependent  var iaoles  studied  in  this  experiment  are 
called  programming  aspects*  They  represent  specific 
isolatable  and  observable  features  of  programming  phenomena* 
furthermore*  they  are  measured  in  an  objective  and 
automatable  manner  li.e.*  they  could  be  extracted  or 
computed  directly  on-line  from  information  readily 
ootainable  from  operating  systems  and  compilers)*  For  each 
programming  aspect  there  exists  an  associated  metric*  a 
specific  algorithm  which  ultimately  defines  that  aspect  and 
by  which  it  is  measured* 

The  programming  aspects  may  be  categorized  as  either 
process-  or  product-related,  on  the  basis  of  what  they 
measure*  Process  aspects  represent  characte ri st ics  of  the 
development  process?  in  oarticular,  the  cost  and  required 
effort  as  reflected  in  the  number  of  computer  job  steps  (or 
runs)  and  the  amount  of  textual  revision  of  source  code 
during  development*  Product  aspects  represent 
cn3  racteri sties  of  the  final  product  that  was  developed,  in 
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oarticular*  the  syntactic  content  and  organization  of  the 
symbolic  source  code*  Examples  of  product  aspects  are 
number  of  lines*  frequency  of  particular  statement  types* 
average  size  of  data  variables'  scope*  etc* 

The  programming  aspects  may  also  be  categorized  as 
either  rudimentary  or  elaoorative*  on  the  basis  of  their 
conceptual  nature.  The  rudimentary  aspects  are  conceptually 
quite  simple*  reflecting  ordinary  surface  features  of  the 
crocess  or  product.  for  example*  the  numbers  of  data 
variaoles  and  routines  in  a  program  are  rudimentary  aspects; 
they  pertain  to  the  sheer  size  of  the  software  and  are 
somewhat  uninteresting  in  themselves*  The  elaborative 
isoects  are  conceptually  more  subtle*  reflecting  deeper 
characteristics  of  the  process  or  product.  for  example*  the 
nunoer  of  times  pairs  of  routines  communicate  via  data 
variables  (see  the  data  bindings  metric  below)  is  an 
elaoorative  aspect;  it  pertains  to  the  software's  modularity 
and  is  intuitively  appealing. 

finally*  the  programming  aspects  may  be  categorized  as 
either  confirmatory  or  exploratory*  on  the  basis  of  the 
motivation  for  their  inclusion  in  the  study.  The 
confirmatory  aspects  had  been  consciously  planned  in  advance 
of  collecting  and  extracting  the  data*  because  intuition 
suggested  that  they  would  serve  well  as  quantitative 
indicators  of  important  qualitative  c h a ra c t e r i s t i c s  of 
sofware  development  phenomena.  It  was  predicted  a  priori 
that  these  confirmatory  aspects  would  verify  the  study's 
basic  oremises  regarding  the  programming  environments  being 
investigated  in  the  experiment.  The  exploratory  aspects 
were  considered  mainly  because  they  could  be  collected  and 
extracted  cheaply  (even  as  a  natural  by-product  sometimes) 
along  with  the  c on f i r ma to ry  aspects.  There  was  little 
serious  expectation  that  these  exploratory  aspects  would  be 
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useful  indicators  of  differences  among  the  groups;  but  they 
were  included  in  the  study  with  the  intent  of  observing  as 
many  aspects  as  possible  on  the  off  chance  of  discovering 
any  unexpected  tendency  or  difference*  Thus*  this  study 
cocaines  elements  of  both  confirmatory  and  exploratory  oata 
analysis  within  one  common  experimental  setting  CTukey  693* 
The  confirmatory  programming  aspects  are  identified  in  the 
accompanying  taoles  by  being  flagged  with  asterisks;  the 
exoloratory  p rog rarnm i n g  aspects  are  unflagged# 

It  should  oe  noted  that  a  large  percentage  of  the 
product  aspects  fall  into  the  rudimentary-exoloratory 
category#  On  the  whole*  these  product  aspects  represent  a 
fairly  extensive  taxonomy  of  the  surface  features  of 
software*  The  idea  that  important  software  qualities  (e*g*t 
’•complexity”)  could  be  measured  by  counting  such  surface 
features  has  generally  been  disregarded  by  some  researchers 
as  too  simplistic  (e.g**  CMills  73*  d*  2323)*  A  resolve  to 
study  these  surface  features  empirically*  to  see  if 
something  might  turn  up*  before  rejecting  the  underlying 
idea*  was  partially  responsible  for  their  inclusion  in  the 
study# 


The  particular  programming  aspects  examined  in  this 
investigation  are  presented  in  Chapters  IV  and  V*  A 
complete  list  of  aspects*  together  with  explanatory  notes, 
is  given  in  Chapter  IV,  with  definitions  for  the  nontrivial 
or  unfamiliar  metrics*  Chapter  V  contains  a  in-depth 
discussion  of  the  elaoora ti ve  aspects* 
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u.  QL  ajp££I£ 


This  chapter  presents  all  of  the  programming  aspects 
examinee  in  the  study.  The  goal  of  this  chapter*  in 
conjunction  with  the  next*  is  to  describe  each  programming 
aspect  and*  where  appropriate*  to  motivate  its  intuitive 
appeal  as  a  software  metric.  Because  the  brief  explanatory 
notes  within  this  chapter  do  not  adequately  describe  a 
certain  subset  of  the  aspects  (namely,  the  elaborative 
aspects)*  they  are  further  discussed  within  the  next 
chapter • 


Table  1  lists  the  programming  aspects  examined  in  this 
investigation.  They  appear  grouped  according  to 
de f i n i t iona l l y  related  categories*  with  indented  qualifying 
phrases  to  specify  variants  of  general  aspects.  when 
referring  to  an  individual  aspect*  a  concatenation  of  the 
major  phrase  with  any  qualifying  phrases  (separated  by  \ 
symbols)  is  used.*  For  example*  the  aspect  label 
COMPUTER  JOB  STEPSXMODULE  C OMPI L ATI ONXUNI QU E 
refers  to  a  metric  involving  computer  b  steps  that  are 

module  compilations  in  which  the  sourc  code  is  unique  from 
all  other  compiled  versions* 

In  order  to  avoid  any  m isunde r$ tandi ng *  a  redundancy 
issue  must  be  stated  and  properly  appreciated.  Several 
instances  of  duplicate  programming  aspects  exist;  that  is* 
some  logically  unique  asoects  reappear  with  another  label, 
in  oruer  to  provide  alternative  views  of  a  given  metric  or 
to  round  out  a  group  of  related  aspects.  For  example*  the 
FUNCTION  CALLS  aspect  and  the  STATEMENT  TYPE  COUNTSN 


♦  Ascect  labels  are  always  written  completely  in  uppercase 
letters*  while  references  to  general  concepts  appear  in 
lowercase  letters,  with  initial  or  defining  occurrences 
underlined. 


TA3LE  1 


"able  1.  P  ro^r  a  mm  ing  £§2*cts 


Parenthesized  numbers  refer  to  the  explanatory  notes 
in  Chapter  IV.  Asterisks  mark  the  confirmatory  aspects;  the 
e>o lorator/  aspects  are  unmarked* 


rudimentary  process  aspects 


C  1  ) 
(  2  ) 
<  T  ) 
C  3  ) 
(4  ) 

(5  ) 
(a) 

(7) 
(  c  ) 


************  ****************************** 

*  COMPUTER  JOB  STEPS 

*  .MODULE  COMPILATION 

*  UNIQUE 
IDENTICAL 

*  PROGRAM  EXECUTION 
MI  SC  EL  LAN ECUS 

*  ESSENTIAL 


AVERAGE  UNIQUE  COMPILATIONS  PER  MODULE  I 
I m a  X •  UNIQUE  COMPILATIONS  F.A.Q*  MODULE  I 

ft***********************.******** ********* 


^  a  x  *  is  an  abbreviation  for  MAXIMUM 
F.A.O.  is  an  aobreviation  for  FOR  ANY  ONE 


(9  ) 


elaborative  process  aspects 

*  *******  ******************  ***  *******  ****** 

I  PROGRAM  CHANGES  I 

****************************************** 


rudimentary  product  aspects 


*  ***********************************  ****** 


CIO 
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MODULES 
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SEGMENT  TYPE  COUNTS  : 

(11) 
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(11) 
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no 

SEGMENT  TYPE  PERCENTAGES  : 

(TO 

FUNC  TI  ON 

(11) 

PROC  EDUR E 

<  1  3) 

i 

AVERAGE  SEGMENTS  PER  MODULE 

(  7  0 

* 

LIMES 

(15) 

* 

STATEMENTS 

(16) 

STATEMENT  TYPE  COUNTS  : 

(  1  O 

:  ~ 

do) 

* 

I  F 

db) 

* 

CASE 

Cj) 

* 

RHILE 

Cl) 

* 

EXIT 

(2c  9  9) 

(PROC)  CALL 

(23,90) 

NON  INTRINSIC 

<3  3,99) 

INTRINSIC 

(24) 

* 

i 

RETURN 

do) 

STATEMENT  TYPE  PERCENTAGES  : 

d  o 

•  r 

d  ) 

* 

IF 

do 

* 

CASE 

co 

* 

*H  I  L  E 
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Cl) 

EXIT 

(';> 

(PSOC) call 

<?:) 

NON  INTRINSIC 

(23) 

INTRINSIC 

(24)  * 

RETURN 

(?5)  * 

AVERAGE  STATEMENTS  PER  SEGMENT 

Co) 

AVERAGE  STATEMENT  NESTING  LEVEL 

(27) 

DECISIONS 

(22,99) 

FUNCTION  CALLS 

(23,99) 

NONI NTRI NSIC 

(23,99) 

INTRINSIC 

C5> 

TOKENS 

Cc) 

AVERAGE  TOKENS  PER  STATEMENT 

(  ?9) 

INVOCATIONS 

(  1  1  ?  9  9 ) 

FUNCTION 

(  23,99) 

NON  I  NTRI  NS  I C 

(2  5,99) 

INTRINSIC 

(1  1 ,99) 

PROC  EOUR  E 

(23,99) 

NON  INTRINSIC 

(23,90) 

INTRI NSIC 

(23) 

NONIN.TRINSIC 

(23) 

INTR  INSI C 

(30) 

4  VG .  INVOCATIONS  PER  (CALLING)  SEGMENT 

(11) 

FUNCTION 

(23) 

NON  I NTRI NS  I  C 

(23) 

INTfil NSIC 

(11) 

PROCEDURE 

(23) 

NONINTRINS IC 

(23> 

INTRINSIC 

(23,99) 

NONINTRI NSIC 

(23) 

INTR  INSI C 

(31.90) 

AVG.  INVOCATIONS  P€R  (CALLED)  SEGMENT 

(11) 

FUNCTION 

(11) 

PROC  EDUR  E 

(32) 

DATA  VARIABLES 

(3  7) 

DATA  VARIABLE  SCOPE  COUNTS  : 

(33)  *  ' 

GLOBAL 

(34) 

ENTRY 

(55) 

MODIFIED 

(35) 

UNMODIFIED 

(34) 

NON  ENTRY 

(55) 

MODIFIED 

(35) 

UNMODIFIED 

(35) 

MODIFIED 

(3  5) 

UNMODIFIED 

(T3) 

NONGLOBA  L 

(33)  * 

PAR,AM£TER 

(36) 

VALUE 

(3o) 

RE  FERENCE 

(33)  * 

LOCAL 

(37) 

DATA  VARIABLE  SCOPE  PERCENTAGES  : 

(  T  3 )  *  1 

GLOBAL 

(34) 

ENTRY 

(35) 

MODIFIED 

(35) 

UNMODI F IED 

(34) 

NONENTftY 

(35) 

MODIFIED 

<35) 

UNMODIFIED 

(35) 

M  CD  I F  IED 

(35) 

UNMODIFIED 
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NON  GLOB  A  L 

PARAMETER 

value 

RE  FERENCE 
LOCAL 


I  AVERAGE  GLOBAL  VARIABLES  PER  MODULE 
ENTRY 
NONENTRY 
MODI  FI  E  D 
UNMODI F I  ED 


AVERAGE  N0NGL08AL  VARIABLES  PER  SEGMENT 

parameter 

LOCAL 


parameter  passage  type  percentages 

VALUE 

REFERENCE 


(SEGMENT, GLOBAL)  ACTUAL  USAGE  PAIRS 
ENTRY 

MOD  I  F  IEO 
UNMODIFIED 
NONENTRY 

MOD  IF  I  ED 
UNMOD  I  F I  ED 
MOD  I  FIED 
UNMODI FIED 


(SEGMENT, GLOBAL)  POSSIBLE  USAGE  PAIRS 
ENTRY 

M  OD  I  F  I  E  D 
UNMODIFIED 
NONENTRY 

MOD  IF  IED 
U  NM  0  D  I  F  I  E  D 
MODI  FI  ED 
UNMODI FIED 


(SEGMENT, GLOoAL)  USAGE  PAIR  REL. PERCENT. 
ENTRY 

MODIFIED 
UNMOD IFIED 
NONE  NTR  Y 

MOD  I F  IED 
UNMODIFIED 
MODI FIED 
UNMODI FIED 


is  an  aofcreviation  for  AVERAGE 

ERCENT.  is  an  abbreviation  for  RELATIVE  PERCENTAGE 


elaborative  product  aspects 


ICYCLOMATIC  COMPLEXITY 


SIMPPRED-NCASF  VARIATION  : 
TOTAL 

RSEGS  :  CC>  =  TO 

0.5  QUANTILE  POINT  VALUE 
C.5  QUANTILE  TAIL  AVERAGE 
O.T  QUANTILE  POINT  VALUE 
C .7  QUANTILE  TAIL  AVERAGE 
C.®  QUANTILE  POINT  VALUE 
.  O.S  QUANTILE  TAIL  AVERAGE 
0.9  QUANTILE  POINT  VALUE 
0.9  QUANTILE  TAIL  AVERAGE 


c 
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(  4  ;  ) 
US) 
l  A  7) 

(4c) 
(4c) 
(  4c  ) 
(4c) 
(4c) 
(4c) 
(4  c) 
(45) 


(45) 

(46) 

(47) 
(45) 
(4c) 
(4c) 
(4c) 


(4  c) 
(4c) 


(4c  ) 
(4c) 


(45) 

(46) 
(4  7) 
(4  -) 
(4c) 

(45) 
( 4  o  ) 
(4  5) 
(4c) 
( 4  o  ) 

(46) 


(4  1) 
( 4  £  ) 
(43) 
(43) 


(4£) 
(4  2) 


(  4  y  ) 

u~) 
(50) 
(50) 
(5  0) 
(5  0) 


99) 


(51) 
U0) 
(5  0) 
(5  0, 
(r0) 
(50) 
(5  0) 


99) 


(  c  1  ) 
(50) 
(50) 
(50) 
(e3) 
(50) 
(50) 
(5  2) 


SIKPPPED-LOGC ASE  VARIATION  : 
TOTAL 

a  SEGS  :  C C >  =  1 0 

0.5  QUANTILE  point  value 
C .5  QUANTILE  TAIL  AVERAGE 
0.7  QUANTILE  POINT  VALUE 
0.7  QUANTILE  TAIL  AVERAGE 
C .3  QUANTILE  POINT  value 
:.S  QUANTILE  TAIL  AVERAGE 
0.9  QUANTILE  POINT  VALUE 
0.9  QUANTILE  TAIL  AVERAGE 


COMPPREO-NCASE  VARIATION  : 
TOTAL 

#SFGS  :  C  C  >  *  1 0 

C.5  QUANTILE  POINT  value 
0.5  QUANTILE  TAIL  AVERAGE 
0.7  QUANTILE  POINT  VALUE 
0.7  QUANTILE  TAIL  AVERAGE 
0.3  QUANTILE  POINT  VALUE 
0.3  QUANTILE  TAIL  AVERAGE 
0.9  QUANTILE  POINT  VALUE 

0.9  quantile  tail  average 


COf'lPPRED-LOGCASE  VARIATION  : 

total 

»S£GS  :  CC>=10 

0.5  QUANTILE  POINT  VALUE 
0.5  QUANTILE  TAIL  AVERAGE 
C .7  QUANTILE  POINT  VALUE 
Q.7  QUANTILE  TAIL  AVERAGE 
C.S  QUANTILE  POINT  VALUE 
C.S  QUANTILE  TAIL  AVERAGE 
0.9  QUANTILE  POINT  value 
0.9  QUANTILE  tail  average 


(SEGMENT, GLOBAL, SEGMENT)  DATA  BINDINGS  : 

actual 

SUB  FUNCTIONAL 
INDEPENDENT 
POSS  I3LE 

RELATIVE  PERCENTAGE 

SOFTWARE  iciENCE  QUANTITIES  : 
vocabulary 
LENGTH 

ESTIMATED  LENGTH 
<4DI  F  FE  RENCEvN  ,  N  ) 

VOLUME 

INTELLIGENCE  CONTENT 
ESTIMATED  BUGS 


1ST  CALCULATION  METHOD  : 
PROGRAM  LEVEL 
D IF  F I CULTY 
POTENTIAL  VOLUME 

language  level 

EFFORT 

ESTIMATED  TIME 


2ND  CALCULATION  METHOD  : 
PROGRAM  LEVEL 
DIFFICULTY 
POTENTIAL  VOLUME 
%DIFFER£NCE(V*,I) 
LANGUAGE  LEVEL 
EFFORT 

estimated  time 
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(P^OC ) CALL  aspect  are  each  laoeled  and  categorized  from  the 
viewpoint  of  implementation  language  construct  frequencies. 
nut  the  same  metrics  can  also  be  considered  from  the 
viewpoint  of  segment  invocation  frequencies,  warranting  the 
inclusion  of  the  two  duplicate  aspects  I N VOC AT  I  ON S\ F UnC TI 0N$ 
and  I N V OC A TI ONS \ P ROCE D U RE S  as  variants  of  the  general 
EVOCATIONS  aspect#  Among  the  197  programming  aspects 
listeo  in  Table  1,  there  are  8  pairs  of  duplicate  aspects 
(identified  in  note  99  below),  leaving  189  nonredundant 
aspects  examined  in  the  study#  By  definition,  the  data 
scores  obtained  for  any  pair  of  duplicate  aspects  will  be 
identical,  and  thus  the  same  statistical  conclusions  will  be 
reached  for  both  aspects#  This  redundancy  must  be  kept  in 
mind  when  evaluating  the  results  of  the  experiments# 

brief  explanatory  notes  about  the  programming  aspects 
are  given  below,  in  the  form  of  numbered  paragraphs  keyed  to 
the  list  in  Table  1,  with  definitions  for  the  nontrivial  or 
unfamiliar  metrics#  These  notes  usually  supply  loose 
explanations  for  the  general  concepts  behind  these 
orogramming  aspects*  before  mentioning  any  restrictions  or 
variations  in  how  they  were  applied  and  measured  in  this 
study#  Technical  meanings  for  system-  or  language-dependent 
terms  (e#g#»  module,  segment,  intrinsic,  entry)  also  appear 
here#  Since  computer  programming  terminology  is  not 
s t a nja r c i z ed ,  the  reader  is  cautioned  against  drawing 
inferences  not  oasea  on  this  dissertation's  definitions# 

^2i£§  f gr  Its  El22£aEiDio2  S 

(1)  A  cojgjjter  stgg  is  a  conceptually  indivisible 

o r o g ra mme r-or i e n t ed  activity  that  is  performed  on  a  computer 
at  the  operating  system  command  level,  is  inherent  to  the 
software  development  effort,  and  involves  a  nontrivial 
Expenditure  of  computer  or  human  resources#  Ideally 
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speaking,  examples  of  job  steps  would  include  editing 
symoolic  texts,  compiling  source  modules,  link-editing  (or 
collecting)  object  moaulest  and  executing  entire  programs; 
horfever»  ocerations  such  as  Querying  the  operating  system 
for  status  information  or  reauesting  access  to  on-line  files 
would  not  Qualify  as  jo^  $teps.  In  this  study* 
cons  ice ra t ion  for  the  COMPUTER  JOB  STEPS  aspect  was  limited 
exclusively  to  the  activities  of  compiling  source  modules  or 
executing  entire  programs*  but  not  all  of  the  activities  so 
counted  dealt  with  the  final  product  (or  logical 
oreaecessors  thereof). 

(?)  A  module  tSaCilaiioo  is  an  invocation  of  the 
implementation  language  orocessor  on  the  source  code  of  an 

individual  module.  In  this  study*  only  compilations  of 
modules  comprising  the  final  software  product  (or  logical 
predecessors  thereof)  are  counted  in  the  COMPUTE*  JOB  $TEPS\ 
v  0  D  U  L  E  COMPILATION  aspect. 

(3)  All  module  compilations  are  ( suo) c at egor i zeo  as 
either  identical  or  un^gug  depending  on  whether  or  not  the 
source  code  compiled  is  textually  identical  to  that  of  a 
previous  compilation.  During  the  development  process  »  each 
unique  compilation  was  necessary  in  some  sense,  while  an 
identical  compilation  could  conceivably  have  been  avoided  by 
saving  the  (relocatable)  object  module  from  a  previous 
compilation  for  later  reuse  (except  in  the  situation  of 
undoing  source  code  revisions  after  they  have  been  tested 
and  found  to  be  erroneous  or  superfluous). 

(4)  A  gr2flra2)  £*££yliQQ  *s  an  invocation  of  a  complete 
programme  r-de  ve loped  program  (after  the  necessary 

compi lation(s)  and  link-editing)  upon  some  test  data.  In 
this  study*  only  executions  of  programs  composed  of  modules 
comprising  the  final  product  (or  logical  predecessors 
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thermo*)  are  counted  in  the  COMPUTER  JOB  STEPSXPROGRAM 
E  *  E  C  U  T ION  asoect  . 

(5)  A  2I2£  £ii^D£2ys  Si£C  is  dn  auxiliary 

compilation  or  execution  of  something  other  than  the  final 
software  product.  In  this  study.  the  COMPUTER  JOB  STEPSX 
MISCELLANEOUS  asoect  counts  exactly  those  activities 

inc  l-joeo  in  the  COMPUTER  JOB  STEPS  aspect  but  not  included 
in  the  COMPUTER  JOB  STEPSXMODULE  COMPILATION  or  COMPUTER  JOB 
STEPS \pR^GRA*  EXECUTION  aspects. 

(6)  An  essentia^  io&  s  t  e  q  is  a  computer  job  step  that 
involves  the  final  software  product  (or  logical  predecessors 
thereof)  and  could  not  have  been  avoided  (by  off-line 
computation  or  by  on-line  storage  of  previous  compilations 
or  results).  In  this  study.  the  COMPUTER  JOB  STEP$\ 

ESSENTIAL  asoect  is  the  sum  of  the  COMPUTER  JOB  $TEPS\MODULE 
C0*TPIL4  TIONVUNIQUE  aspect  plus  the  COMPUTER  JOB  $  TE  PS  X  P  ft  OG  R  A  M 
EXECUTION  aspect  • 

(7)  The  AVERAGE  UNIQUE  COMPILATIONS  PER  MODULE  aspect 
is  a  -ay  of  normalizing  the  COMPUTER  JOB  STEPSXMODULE 
COMPIlA TIONXUNIQUE  aspect. 

(3)  The  MAXIMUM  UNIQUE  COMPILATIONS  FOR  ANY  ONE  MODULE 
aspect  is  another  way  of  normalizing  (by  isolating  the  worst 
case)  the  COMPUTER  JOB  STEPSXMODULE  C OMP I L A T I  ON X UNI QU t 
asoec  t  , 

(7)  The  fi£2aii2  £tl2Q2£i  metric  CDunsmore  &  Gannon  77] 
is  a  measure  of  the  total  amount  of  textual  revision  made  to 
program  source  code  during  the  (postdesign)  software 
development  period.  The  rules  for  counting  program  changes 
are  designed  to  identify  individual  conceptual  changes 
algorithmically.  Each  occurrence  of  the  following  revisions 
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is  counted  as  a  single  program  change:  modification  of  a 
single  statement  »  insertion  of  contiguous  statements,  or 
modification  of  a  single  statement  followed  immediately  by 
insertion  of  contiguous  statements#  However,  the  following 
revisions  are  not  counted  as  program  changes:  deletion  of 
contiguous  statements!  insertion  of  standard  output 
statements  or  special  compiler-provided  debugging 
directives?  and  instances  of  lexical  reformatting  witnout 
syntactic/semantic  alteration* 

$ee  Chapter  v  for  further  discussion  of  the  program 
changes  metric* 

(10)  A  mg^uiS  is  a  separately  compiled  portion  of  the 
complete  software  system#  In  the  implementation  language 
SIMPL-T,  a  typical  module  is  a  collection  of  the 
declarations  of  several  global  variables  and  the  definitions 
of  several  segments#  In  this  study,  only  those  modules 
which  comprise  the  final  product  are  counted  in  the  MODULES 
asoec t  • 

(11)  A  segment  is  a  collection  of  source  code 
statements,  together  with  declarations  for  the  formal 
parameters  and  local  variables  manipulated  by  those 
statements?  that  may  be  invoked  as  an  operational  unit#  In 
the  implementation  language  S1MPL-T,  a  segment  is  either  a 
value-returning  lyDSilfiD  (invoked  via  reference  in  an 
exDression)  or  else  a  non-value-returning  ££2££dy££  (invoked 
via  the  CALL  statement).  The  segment*  function,  and 
orocedure  of  S1MPL-T  correspond  to  the  ( s ub ) prog  ram , 
function,  and  subroutine  of  Fortran*  r e sp ec t  i  v e l y  • 

(12)  The  g  roup  of  aspects  named  SEGMENT  TYPE  COUNTS 
gives  the  absolute  number  of  p rogramme r-def ined  segments  of 
each  type*  The  group  of  aspects  named  SEGMENT  TYPE 
PERCENTAGES  gives  the  relative  percentage  of  each  type  of 
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segmentt  comcared  with  the  total  number  of  proqrammer- 
defined  segments#  The  latter  group  of  aspects  is  a  way  of 
normalizing  the  former  groups  of  aspects* 

(13)  Since  in  tne  implementation  language  SIMPL-T 
segment  definitions  occur  within  the  context  of  a  module*  a 
natural  way  to  normalize  (or  average)  the  raw  counts  of 
segments  is  provided.  The  AVERAGE  SEGMENTS  PER  MODULE 
aspect  represents  the  average  size,  in  segments,  of  modules 
in  the  program. 

(14)  The  LINES  aspect  counts  every  textual  line  of 
delivered  source  code  in  the  final  product*  incluaing 
comments*  compiler  directives,  variable  declarations* 
executable  statements*  etc. 

05)  The  STATEMENTS  aspect  counts  only  the  executable 
constructs  in  the  source  coae  of  the  final  product.  These 
are  high-level*  s t rue t u red-programmi ng  statements*  including 
simple  statements  —  such  as  assignment  and  procedure  call--as 
well  dS  compound  statements— such  as  if-then-else  and  while- 
do--wnich  have  other  statements  nested  within  them.  The 
i m o l erne nt a t i on  language  SIMPL-T  allows  exactly  seven 
different  statement  types  (referred  to  Oy  their 
distinguishing  keyword  or  symbol)  covering  assignment  (:-)* 
alternation-selection  (IF*  CASE)*  iteration  (WHILE*  EXIT), 
and  procedure  invocation  (CALL*  RETURN)*  Incut-output 
operations  are  accomplished  via  calls  to  intrinsic 
procedures  • 

(16)  The  group  of  aspects  named  STATEMENT  TYPE  COUNTS 
gives  the  absolute  number  of  executable  statements  of  each 
tyoe.  The  group  of  aspects  named  STATEMENT  TYPE  PERCENTAGES 
gives  the  relative  percentage  of  each  type  of  statement, 
compared  with  the  total  numoer  of  executable  statements* 


35 


CHAPTER  IV 


The  latter  group  of  aspects  is  a  way  of  normalizing  the 
former  groups  of  aspects* 

(17)  As  mentioned  above*  the  :=  symbol  denotes  the 
assignment  statement*  It  assigns  the  value  of  the 
expression  on  the  right  hand  side  to  the  variable  on  the 
left  hand  side • 

(13)  Both  if-then  and  if-then-else  constructs  are 
counted  as  IE  statements*  Each  IF  statement  allows  the 
execution  of  either  the  then-  or  else-part  statements* 
deoenoing  upon  its  boolean  expression, 

(19)  The  CASE  statement  provides  for  selection  from 
several  alternatives,  depending  upon  the  value  of  an 
expression*  In  the  i mp le me n ta t i on  language  SIMPL-T*  exactly 
one  of  the  alternatives  (or  an  optional  else-part)  is 
selected  per  execution  of  a  CASE,  a  tist  of  constants  is 
explicitly  given  for  each  alternative*  and  selection  is 
based  upon  the  equality  of  the  expression  value  with  one  of 
the  constants*  (These  constants  are  referred  to  as  "'case 
lapels';  these  a l t erna t i v e s *  as  'case  branches*')  A  case 
construct  with  n  alternatives  is  logically  and 
semantically  equivalent  to  a  series  of  n  nested  if-then- 
else  const  rue  t  s • 

(20)  The  WHILE  statement  is  the  only  iteration  or 
looping  construct  provided  by  the  implementation  language 
SI^PL-T,  It  allows  the  statements  in  the  loop  body  to  be 
executed  repeatedly  (zero  or  more  times)  depending  upon  a 
boolean  expression  which  is  reevaluated  at  every  iteration; 
the  loop  may  also  be  terminated  via  an  EXIT  statement.  Each 
while  statement  may  be  ootionally  labeled  with  a  designator 
(referenced  by  EXIT  statements)  which  uniquely  identifies  it 
from  other  nested  WHILE  statements* 
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(21)  The  EXIT  statement  allows  the  abnormal 
termination  of  iteration  loops  by  unconditional  transfer  of 
control  to  the  statement  immediately  following  the  WHILE 
statement*  Thus  it  is  a. very  restricted  form  of  GOTO.  This 
editing  may  take  place  from  any  depth  of  nested  loops*  since 
the  EXIT  statement  may  optionally  name  a  designator  which 
ioentifies  the  loop  to  be  exited;  without  such  a  designator 
only  the  immediately  enclosing  loop  is  exited. 

(22)  Since  there  are  two  types  of  segments  in  the 
imo l ement a t ion  language  SIMPL-T*  there  are  two  types  of 
’'calls'*  or  segment  invocations*  Procedures  are  invoked  via 
the  CALL  statement*  and  functions  are  invoked  via  reference 
in  an  expression*  The  counts  for  these  separate  constructs 
are  reported  separately  as  the  (PROC)CALL  and  FUNCTION  CALL 
aspects^  and  jointly  as  the  INVOCATIONS  aspect# 

(23)  iQlliQSlt  means  provided  and  defined  by  the 
implementation  language;  no n|n t r ^ns i £  means  provided  and 
defined  by  the  programmer*  These  terms  are  used  to 
distinguish  built-in  procedures  or  functions  (which  are 
supported  by  the  compiler  and  utilized  as  primitives)  from 
segments  (which  are  written  by  the  programmer)*  Nearly  all 
of  the  intrinsic  procedures  in  the  implementation  language 
SI'IPL-T  perform  input-output  operations  and  external  data 
file  manipulations*  All  of  the  intrinsic  functions  in 

5 1  *  PL- T  perform  data  type  coercions  and  character  string 
ope  rations. 

(24)  The  RETURN  statement  allows  the  abnormal 
termination  of  the  current  segment  by  unc ond i t i ona l 
resumption  of  the  previously  executing  segment*  Thus  it  is 
another  very  restricted  form  of  GOTO*  within  a  function*  a 
RETURN  statement  must  specify  an  expression*  the  value  of 
which  becomes  the  value  returned  for  the  function 
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invocation.  within  a  procedure  t  a  RETURN  statement  must  not 
specify  such  an  expression*  Add  i  t  iona l ly ,  a  simple  RETURN 
statement  is  optional  at  the  textual  end  of  procedures;  it 
will  ue  implicitly  assumed  if  not  explicitly  coded.  In  this 
st jay,  the  total  number  of  explicitly  coded  and  implicitly 
assumed  RETURN  statements*  both  from  functions  and 
orocedures  combined*  is  counted. 

(25)  The  AVERAGE  STATEMENTS  PER  SEGMENT  aspect 
provides  a  way  of  normalizing  the  number  of  statements 
relative  to  their  natural  enclosure  in  a  program*  the 
segment.  The  measure  also  represents  the  average  length*  in 
executable  statements*  of  segments  in  the  program. 

(26)  In  the  implementation  language  SI^pl-T*  both 
simple  (e.g.*  assignment)  and  compound  (e.g.*  if-then-e  Ise) 
statements  may  be  nested  inside  other  compound  statements. 

a  oarticular  nesting  level,  is  associated  with  each 
statement*  starting  at  1  for  a  statement  at  the  outermost 
?evel  of  each  segment  and  increasing  by  1  for  successively 
pasted  statements.  Nesting  level  can  be  displayed  visually 
via  proper  and  consistent  indentation  of  the  souce  coae 
is  t ing  . 

(27)  The  DECISIONS  aspect  is  the  sum  of  the  numoers  of 
’F*  CASE*  and  WHILE  statements  within  the  program's  source 
code.  Each  of  these  statements  represents  a  unique 
(possibly  repeated)  run-time  decision  coded  by  the 
programmer.  Because  the  implementation  language  SIMPl-T  has 
only  structured  control  structures*  this  aspect  is  closely 
related  to  the  cyclomatic  complexity  metrics  discusseo 
below. 

(23)  lokens  are  the  basic  syntactic  en  t  i  t  i  e  s— s uc  h  as 
keywords*  operators*  parentheses*  identifiers,  etc.  — that 
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occur  in  a  program  statement*  The  average  number  of  tokens 
cer  statement  may  be  viewed  as  an  indication  of  how  much 
"i  n  t  o  r  mat  i  on"  a  typical  statement  contains,  how  "powerful'*  a 
tyoical  statement  is,  or  how  concisely  the  statements  are 
coded* 

(29)  An  iQ¥Q££ii2Q  is  the  syntactic  occurrence  of  a 
construct  oy  which  either  a  proy rammer -de f i ned  segment  or  a 
built-in  routine  is  invoked  from  within  another  segment; 
both  procedure  calls  and  function  references  are  counted  as 
IWOCA^  IONS*  They  are  (s  ub  )  ca  t  e  go  r  i  zed  by  the  type  (i*e*, 
♦unction  or  procedure,  nonintrinsic  or  intrinsic)  of  segment 
or  routine  being  invoked* 

(30)  The  group  of  aspects  named  AVERAGE  INVOCATIONS 
PE?  (CALLING)  SEGMENT  reoresents  one  way  to  normalize  the 
absolute  number  of  invocations*  These  aspects  reflect  the 
average  number  of  calls  to  p rog ramme r-def ined  segments  and 
built-in  routines  from  a  programmer-defined  segment*  They 
are  ( suo) c ategor i zed  oy  the  type  of  segment  or  routine  being 
invoked* 

(31)  The  group  of  aspects  named  AVERAGE  INVOCATIONS 
(CALLED)  SEGMENT  represents  another  way  to  normalize  the 

aosolute  number  of  invocations*  These  aspects  reflect  the 
average  numoer  of  calls  to  a  p rog r amm e r -d e f i ne d  segment  from 
other  segments*  They  are  (suo Jcategor i zed  by  the  type 
(i*e.,  function  or  procedure)  of  segment  being  invoked* 

(32)  A  yaja  y££iab|g  is  an  individually  named  scalar 
or  structure*  The  implementation  language  SIMPL-T  provides: 
(a)  three  data  i^ges  for  scalars  —  integer,  character,  and 

(varying-lengtn)  string; 

(o)  me  kind  of  data  structure  (besides  s  c  a  l  ar  ) --s  i  ng  l  e 

dimensional  array,  with  zero-origin  subscript  range; 
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a  no 

Cc)  several  levels  of  S£2B£  (as  explained  in  note  33  oelo*) 
for  data  variables* 

In  adoition,  all  data  variables  in  a  SIrtPL-T  program  must  be 
explicitly  declared,  w i th  attributes  fully  specified*  The 
0/^A  VARIABLES  aspect  counts  each  data  variable  declared  in 
the  Mnal  software  product  once,  regardless  of  its  type, 
structure,  or  scope*  Note  that  each  array  is  counted  as  a 
single  data  variable. 

(53)  in  the  implementation  language  SI^PL-T,  data 
variables  can  have  any  one  of  essentially  four  levels  of 
3coge--entry  global,  nonentry  global,  parameter,  and  Local-- 
deoenaing  on  where  and  how  they  are  declared  in  the  program. 
Note  that  the  notion  of  scope  deals  only  with  static 
accessibility  by  name;  the  effective  accessibility  of  any 
variaole  can  always  be  extended  by  passing  it  as  a  parameter 
between  segments*  The  scope  levels  are  explained  here  (and 
presented  in  the  aspect  ( su b )c a t egor i z a t i ons  )  via  a 
Hierarchy  of  distinctions* 

The  primary  distinction  is  between  global  ano 
nunglobal*  Qlogal  variables  are  accessible  by  name  to  each 
tne  segmencs  in  the  module  in  which  they  are  dec  larea* 
Iiy2yi2£ai  variables  are  accessible  oy  name  only  to  the 
single  segment  in  which  they  are  declared* 

Global  vara  idles  are  secondarily  d i s t i n gu i s h ed  into 
t-ntry  and  nonentry  cate]OrieS.  Entr^  2i2feais  be 

accessiole  by  na^e  to  each  of  the  segments  in  several 
modules  (as  explained  in  note  3A  below)*  2i2&iis 

are  accessiole  by  name  only  within  the  module  in  which  they 
are  Jec la  red* 

Nonglobal  variables  «.re  secondarily  distinguished  into 
'orroal  parameters  and  locals*  formal  Bi£52Sl££s  are 
accessible  by  name  only  within  the  enclosing  (called) 
segment,  put  their  values  are  related  to  the  calling  segment 
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(as  explained  in  note  36  below).  are  accessible  by 

nane  only  within  the  enclosing  segment  »  and  their  values  are 
completely  isolated  from  any  other  segment# 

(34)  Entry  means  that  the  data  variable  (explicitly 

declared  as  ENTRY  in  one  moaule)  is  accessible  from  within 
other  separately  compiled  modules  (in  which  it  must  be 
explicitly  declared  as  EXTernal).  means  that  the 

data  variable  is  accessible  only  within  the  module  in  which 
it  is  declared.  In  this  study,  these  terms  are  used  only  in 
reference  to  global  variables,  although  the  imp l erne nt a t i on 
language  $IMPl~T  handles  the  accessibility  of  segments 
across  modules  in  the  same  way. 

Although  the  implementation  language  SI^PL-T  does  allow 
the  ExTernal  attribute  to  be  declared  locally  so  that  just 
the  enclosing  segment  has  access  to  an  identifier  declared 
as  ENTRY  in  another  module,  this  feature  is  seldom  used;  it 
never  occurred  in  any  of  the  final  software  products 
examined  in  this  study. 

(35)  Neans  referred  to,  at  least  once  in  the 
ore  gram  source  code,  in  such  a  manner  that  the  value  of  the 
data  variable  might  be  (re) set  when  (and  if)  the  appropriate 
statements  were  to  oe  executed.  Data  variables  can  be 

(re) set  only  Dy  (a)  being  the  "target"  of  an  assignment 
statement,  (b)  being  passed  by  reference  to  a  programmer- 
defined  segment  or  built-in  routine,  or  (c)  being  named  in 
an  "input  statement."  (This  third  case  is  really  covered  by 
the  second  case  since  all  the  "input  statements"  in  S1MPL-T 
are  actually  calls  to  certain  intrinsic  procedures  with 
pas sed-Dy-ref erence  parameters.)  yQEfidilifid  means  referred 
to,  throughout  the  program  source  code,  in  such  a  manner 
that  the  value  of  the  data  variable  could  never  be  (re)set 
d,jr  execution.  These  terms  refer  only  to  global  data 
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Any  global  variable  is  allowed  to  have  an  initial  value 
(constants  only)  specified  in  its  declaration.  Globais 
which  are  initialized  but  unmodified  are  especially  useful 
in  SIKPL-T  programsf  serving  as  '•named  constants." 

(36)  The  i nplementa t ion  language  SIMPL-T  allows  two 
t>oes  of  parameter  passage.  Pas s-by-v^ lug  means  that  the 
value  of  the  actual  argument  is  copied  (upon  invocation) 
into  the  c or  re sp ond i ng  formal  parameter  (which  thereafter 
behaves  like  a  local  variable  for  all  intents  and  purposes); 
the  effect  is  that  the  called  routine  cannot  modify  tne 
value  of  the  catting  segment's  actual  argument.  Pass-by- 
refgren£e  means  that  the  address  of  the  actual  argument 
(which  must  be  a  variable  rather  than  an  expression)  is 
passeo  (upon  invocation)  to  the  called  routine;  the  effect 

is  that  any  changes  made  by  the  called  routine  to  the 

cor r e s pond ing  formal  parameter  will  be  reflected  in  the 

value  of  the  calling  segments  actual  argument  (upon 

return).  In  $IM*>L-T,  formal  parameters  that  are  scalars  are 

normally  (default)  passed  by  valuev  but  they  may  be 

**40ticitly  declared  to  be  passed  by  reference;  formal 

o,j  r  ame  t  e  r  s  that  are  arrays  are  always  Passed  by  reference. 

(37)  The  group  of  aspects  named  DATA  VARIABLE  SCOPE 
COJNTS  gives  the  absolute  number  of  declared  data  variables 
Kcoroing  to  each  level  of  scope.  The  group  of  aspects 
letted  DATA  VARIABLE  SCOPE  PERCENTAGES  gives  the  relative 
percentage  of  variables  at  each  scope  level f  compared  with 
the  total  number  of  declared  variables.  The  latter  group  of 
aspects  is  a  way  of  normalizing  the  former  groups  of 
aspects. 

(3c)  A  natural  way  to  normalize  (or  average)  the  raw 
counts  of  data  v^ruoles  is  providedt  since  data  variable 
declarations  in  th>  implementation  language  $I*PL-T  may  only 
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*p Dear  in  certain  context  s  witMn  the  orogran:  globals  in 
me  context  of  a  module  and  nonglofcals  in  the  context  of  a 
segment*  The  group  of  aspects  named  AVERAGE  GLOBAL 
VARIABLES  PER  MODULE  represents  the  average  number  of 
globals  declared  for  a  module*  The  group  of  aspects  named 
AVERAGE  NONGLOBAL  VARIABLES  PER  SEGMENT  represents  the 
average  number  of  nonglobals  declared  for  a  segment* 

(3*)  Since  there  are  two  types  of  parameter  passing 
mecnanisms  in  the  i mp l e me n t a t i on  language  $I*PL-T  (as 
explained  in  note  36  above) t  it  is  desirable  to  normalize 
their  raw  frequencies  into  relative  percentagesv  indicating 
the  programmer's  degree  of  "pref erence'*  for  one  type  or  the 
other*  The  grouo  of  aspects  named  PARAMETER  PASSAGE  TYPE 
PERCENTAGES  gives  the  percentages  of  each  type  of  parameter 
relative  to  the  total  number  of  parameters  declared  in  the 
program. 

(40)  A  segment-glooal  usagg  gaiir  (p,r)  is  an 
instance  of  j  global  variable  r  being  used  by  a  segment 
C  (i*e*»  the  global  is  either  modified  (set)  or  accessed 
(fetched)  at  least  once  within  the  statements  of  the 
segment)*  Each  usage  pair  represents  a  unique  “use 
connection”  between  a  global  and  a  segment*  in  this  study, 
segment-g l oba l  usage  pairs  were  (suo)ca tegor ized  by  the  type 
(i*e*»  entry  or  nonentry,  modified  or  unmodified)  of  global 
jata  variable  involved  and  were  counted  in  three  different 
wars* 

First,  the  (SEGMENT, GLOBAL)  ACTUAL  USAGE  PAIRS  aspects 
count  the  absolute  numbers  of  realized  usage  pairs  (p»r)  : 
the  global  variable  r  is  actually  used  by  segment  p  • 

They  represent  the  frequencies  of  use  connections  realized 
within  the  program.  Second,  the  ( SEGMENT  , GLOBAL )  POSSIBLE 
USAGE  PAIRS  aspects  count  the  absolute  numbers  of  potential 
usa*e  pairs  (ptr)  ,  given  the  program's  global  variables 
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ind  tneir  declared  scope:  the  scope  of  global  variable  r 
">erel/  contains  segment  p  »  so  that  p  coula  potentially 
modify  or  access  r  •  These  counts  of  possible  usage  pairs 
are  computed  as  the  sum  of  the  number  of  segments  in  each 
glooal's  scope*  They  reoresent  a  sort  of  "worst  case" 
frequencies  of  use  connections*  Third,  the  (SEGMENT , GLOBAL ) 
USAGE  PAIR  RELATIVE  PERCENTAGE  aspects  are  a  way  of 
normalizing  the  numoer  of  usage  pairs  since  these  measures 
are  ratios  (expressed  as  percentages)  of  actual  usage  pairs 
to  possible  usage  pairs*  They  represent  the  frequencies  of 
r^dtiiea  use  connections  relative  to  potential  use 
connections*  These  usage  pair  relative  percentage  metrics 
are  empirical  estimates  of  the  likelihood  that  an  arbitrary 
segment  uses  (i*e*?  sets  or  fetches  the  value  of)  an 
aroitrary  global  variable* 

In  some  sense,  all  three  types  of  aspects  dealing  with 
se g men t -g l oba l  usage  pairs  (actual,  possible,  and  relative 
percentage)  reflect  quantifiable  characteristics  of  "data 
modularization'*  within  a  program,  i*e*,  the  static 
organization  of  data  definitions  and  references  within 
segments  and  modules*  In  particular,  the  possible  usage 
pii rs  aspects  reflect  the  general  degree  of  encapsulation 
r  if orcea  by  the  i mp l e me nt a t i on  language  for  global 
variaoles*  Moreover,  the  usage  pair  relative  percentage 
aspects  reflect  the  general  degree  of  ,#g  loba  t  i  t  y  "  for  global 
variables?  i*e*,  the  extent  to  which  globals  are  actually 
usea  oy  those  seqments  that  could  possibly  do  so* 

(4T)  A  se j men t-g l oba l -segmen t  djtj  feinting  [Stevens, 
*y?rs  3  Constantine  74,  op.  118-1193  (p,r,q)  is  defined  as 

an  occurrence  of  the  following  arrangement  in  a  program:  a 
segment  p  modifies  (sets)  a  global  variable  r  that  is 
also  accessed  (fetched)  by  a  segment  q  ,  with  p  different 
*ro m  q  •  The  ( S EGME N T , G LO 8 AL , S EGHE NT )  DATA  BINDINGS 
aspects  count  these  unique  communication  paths  between  pairs 
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of  segments  via  glooal  variables*  These  aspects  thus 
reflect  the  degree  of  one  form  of  connectivity  within  a 
program. 

See  Chapter  V  for  further  discussion  of  the  data 
bindings  metrics* 

(42)  In  this  study*  s egmen t-g l oba l - se g men t  data 
bindings  were  counted  in  three  different  ways:  ACTUAL* 
PQSSIoLE*  and  RELATIVE  PERCENTAGE*  First,  the  DATA 

PI N D I N G S \ A CTU AL  aspect  counts  the  total  number  of  data 
bindings  actually  coded  in  the  program*  reflecting  the 
degree  of  realized  connec t i v  i  t y *  Second,  the  DATA  01NDINGSV 
POSSIaLE  aspect  counts  the  total  number  of  data  bindings 
that  could  possibly  be  allowed*  given  the  program's 
organi zat i ona 1  structure*  It  reflects  the  degree  of 
potential  connectivity.  Third*  the  DaTa  Bl N Dl NG$ EL AT  I VE 
PERCENTAGE  aspect  is  the  ratio  (expressed  as  a  percentage) 
of  actual  data  bindings  to  possible  data  bindings* 
reflecting  the  normalized  degree  of  realized  connectivity 
relative  to  potential  connectivity* 

See  Chapter  v  for  further  discussion  of  the  data 
bindings  metrics. 

(43)  Actual  data  bindings  are  ( s ub ) ca t egor i ze d 
depending  on  the  invocation  relationship  between  the  two 
segments.  A  data  binding  (p*r*q)  is  su £ 1 UQ£ t i 202 1  ** 
either  of  the  two  segments  p  or  q  can  invoke  the  other* 
whether  directly  or  indirectly*  as  a  "subroutine# "  A  data 
binding  (p*r*q)  is  iQg£E£Q££Ql  if  neither  of  the  two 
segments  p  or  q  can  invoke  the  other*  whether  directly 
or  indirectly* 

See  Chapter  v  for  further  discussion  of  the  data 
bindings  metrics* 

(44)  £S3fii£*2i*  Cf"cCabe  763  is  a  graph- 
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theoretic  measure  of  control-flow  complexity#  For  an 
imole mentation  language  with  only  structured  control 
structures  (sucn  as  SImpl-T),  this  measure  is  dependent  only 
on  the  number  of  predicates  (i*e#,  Boolean  expressions 
governing  flow  of  control)  in  the  source  code#  The 
cyclomatic  complexity  v(p)  of  a  program  p  with  II 
predicates  strewn  among  s  segments  is  computed  as 

v  ( P)  =  II  ♦  s  ; 

the  cyclomatic  complexity  v($>  of  a  segment  S  with  tt 
predicates  is  computed  as 

V  (  $  )  s  Tf  +  1  # 

See  Chapter  V  for  further  aiscussion  of  the  cyclomatic 
complexity  metrics# 

(45)  Four  definitional  variations  of  the  basic 

cyclomatic  complexity  measure  were  examined  in  this  study  in 
order  to  explore  alternatives  for  identifying  predicates  and 
for  handling  case  statement  constructs#  Under  the  SIMPPRED 
alternative!  simple  Boolean  subexpressions  joined  by  and  or 
or  connectives  are  each  counted  as  predicates#  Under  the 
C0*PPRED  alternative!  only  each  complete  Boolean  expression 
is  counted  as  a  predicate#  Under  the  NCASE  alternative, 
each  case  statement  construct  is  counted  as  contributing  n 
predicates,  where  n  is  the  number  of  “cases"  involved# 
Under  the  LOGCASE  alternative,  each  case  statement  construct 
is  countea  as  contributing  Llog2*  n  predicates,  thus 

giving  a  discount  for  case  statement  constructs  relative  to 
series  of  nested  ifthenelse  constructs- 

See  Chapter  V  for  further  discussion  of  the  cyclomatic 
comp lexity  metrics. 

(46)  for  each  of  the  definitional  variations,  tne 
CYCLOMATIC  COMPLEXITY \ . #• \TOTAL  asoect  measures  the 


*  The  notation  L  *  J  signifies  the  greatest  integer  less 
than  or  equal  to  x  • 


C  H  APT  F  ft  IV 


c/c  Lomatic  complexity  of  the  entire  program.  It  is  simply 
the  sum  of  cyclomatic  complexity  values  for  the  individual 
segments  comprising  the  program* 

uee  Cnaoter  v  for  further  discussion  of  the  cyclomatic 
complexity  metrics* 

(4,7)  For  each  of  the  definitional  variations!  the 
CYCLOMATIC  C0MPLEXITY\...\#$EGS:CC>=1Q  aspect  counts  the 
nunDer  of  segments  in  the  program  whose  cyclomatic 
complexity  values  equal  or  exceed  the  threshold  value  10* 

See  Chaoter  V  for  further  discussion  of  the  cyclomatic 
complexity  metrics* 

(4a)  For  each  of  the  definitional  variations!  a  common 
descriptive  statistic  of  the  empirical  distribution  of 
cyclomatic  complexity  values  from  the  individual  segments 
comprising  an  entire  an  entire  program  was  used  as  a  vehicle 
for  measuring  the  general  level  of  cyclomatic  complexity 
within  the  relatively  nontrivial  segments  of  the  program# 
This  descriptive  statistic  known  as  a  gujnjiis  [Conover  71, 
pp.  31-32,  po*  7 2-733  ,  can  be  loosely  described  (in  the 
discrete  case)  as  the  value  (of  the  random  variable  in 
question)  corresoonaing  to  a  particular  fixed  probability 
level  on  the  cumulative  relative  frequency  curve 
(representing  the  distribution  of  that  random  variable)* 

The  CYCLOMATIC  COMPLExITYV. *.\  f  quantile  point  value 
aspects  are  defined  to  measure  the  largest  integer  x  such 
that  the  fraction  of  cyclomatic  complexity  values  which  are 
less  than  x  is  less  than  or  equal  to  the  fixed  fraction 
f  .  The  CyClOMATIC  C 0M*L E X  I T Y \ . . . \  f  QUANTILE  TAIL  AVERAGE 
aspects  are  defined  to  measure  the  average  of  cyclomatic 
complexity  values  greater  than  or  equal  to  the  f  quantile 
point  value*  Several  particular  quantiles  were  examined  in 
this  study:  the  0*5  quantile  is  closely  related  to  the 
distribution's  median,  and  the  3*7,  0.8,  and  3*9  quantiles 
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provide  a  series  of  increasingly  smaller  tails  of  the 
di  s  t  r ibut i on* 

See  Chapter  v  for  further  discussion  of  the  cyclomatic 
complexity  metrics* 


( 4  V )  According  to  software  science  theory  [Halstead 
771 *  several  interesting  quantities  can  be  computed  from  the 
source  code  of  a  program  ana  used  to  measure  c ha r ac t e r i s t i c s 
of  both  the  aostract  algorithm  and  its  expression  as 
implemented*  All  of  these  software  S£2£n££  99iQlitl££  are 
computed  in  terms  of  the  number  of  conceptually  unique 
"operators"  and  '’operands'1  and  the  total  occurrences  of  such 
'‘ocerators"  and  "operands"  within  a  program*  In  this  study, 
these  "operators"  and  "ooerands"  were  identified 
syntact ica lly  according  to  a  set  of  rules  established  for 
the  i mo lement a t i on  language  SIMPL-T* 

See  Chapter  v  for  further  discussion  of  the  software 
sci ence  metri cs* 

(50)  Given  the  basic  parameters  of  software  science: 
total  "operator"  count  N ^ 
total  "operand"  count  N2 
unique  "operator"  count 
unique  "ooerand"  count 
unique  potential  "operand"  count  r^* 
the  following  formulas  define  the  software  science 
quantities  examined  in  this  study: 


VOC ABULAR  1 

length 

ESTIMATED  length 
^I  FFERENCE(Nvft) 

VOLUME 

potential  volume 

PRDG&AM  LEVEL 
DIF  F ICULTY 


n  =  n1  ♦  n2 
N  "  N1  *  N2 

#  *  <n x  *  log2(n1))  ♦  <n2  *  l°92<n2>> 
»(|S-Nl)/(N> 

V  =  N  *  tog2  (n ) 

v*  =  <2  ♦  n2*)  *  log2  <  2  ♦  n2*> 
l  =  v*  /  v 


D  =  1  /  L 
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intelligence  content 
*?DIFFfcp£NCE(V*tI> 
LANGUAGE  LEVEL 
EFFORT 

ESTIMATED  TIME 
ESTIMATED  3UGS 
where  S  and  E  are 


I  =  ( 2  /  r^)  *  (n2  /  N2)  •  V 

=  (  I  1  -  V*| )  /  (V*) 

X  =  L  *  v* 

E  =  V  *  C 

T  =  E  /  S 

0  =  L  *  E  /  E 

o 

psychologically  determined  constants 


See  Chapter  V  for  further  discussion  of  the  software 


science  metrics. 


(51)  Two  different  calculation  methods  were  employed 
in  the  study  to  compute  the  subset  of  software  science 
quantities  whose  exact  values  cannot  be  obtained  directly 
(via  the  defining  formulas)  from  a  program's  source  code. 
These  calculation  methods  each  rely  upon  a  different 
estimation  technique  to  obtain  approximate  values  for  these 
quantities.  The  1ST  CALCULATION  METHOD  relies  upon  the 
commonly  accepted  theoretical  estimate  of  the  program  level 
quantity;  the  2ND  CALCULATION  METHOD  relies  upon  an 
internally  applied  empirical  estimate  of  the  language  level 
qua  n  t i t  y • 

See  Chapter  v  for  further  discussion  of  the  software 
science  metrics# 


(99)  Several  instances  of  duplicate  programming 
asoects  exist  in  the  Table  1  listing.  That  is,  some 
logically  unique  aspects  reappear  with"  another  label,  for 
reasons  explained  aoove.  Listed  below  are  the  pairs  of 
duplicate  programming  asoects  that  were  considered  in  this 
study: 


1.  FUNCTION  CALLS 

?.  NON  INTRINSIC 

?.  INTRINSIC 

C.  STATEMENT  TYPE  COUNTSV 
(  PROC ) CALL 

r.  NONINTRINSIC 


<=>  I N VOC A T ION $\ FUNCTION 
<=>  NONINTRINSIC 

<->  INTRINSIC 

<=>  I N VOC A T ION S\ PROCEDURE 
<->  NONINTRINSIC 


'v 


CHAPTER 

IV 

INTRINSIC 

<*> 

INTRINSIC 

AVERAGE  INVOCATIONS  °ER 
( C  ALLING)  SEGMENTS 

NON  INTRINSIC 

A 

II 

V 

AVERAGE  INVOCATIONS  PER 
(CALLED)  SEGMENT 

SOFTWARE  SCIENCE 

ouantitiesmntelligence 

CONTENT 

A 

N 

V 

software  SCIENCE 
QUANT1TIESMST  CALCULATION 
“ETHOD\POTENTI AL  volume 

definition,  the  data  scores  obtained  for  dny  pair  of 
duplicate  aspects  will  be  identical*  and  thus  the  same 
statistical  conclusions  *ill  be  reached  for  both  aspects. 
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v.  2my$si2 a  °t  rmits 


This  chapter  oroviaes  an  in-pePth  aiscuss ion  of  the 
eUoorati  ve  programming  aspects  examined  in  the  study.  The 
material  is  presented#  in  a  tutorial  fashion#  in  order  to 
Tot ivate  their  appeal  as  software  metrics  and  to  explain  how 
they  might  be  interpreted#  The  reaoer  who  is  acquainted 
with  one  or  more  of  these  measures  might  consider  skimming 
the  corresponding  sections# 


f£2a£22  C h  ang e s 


The  program  changes  metric  pertains  to  textual 
revisions  made  to  program  source  code  during  development? 
from  the  time  a  program  is  first  presented  to  the  computer 
system#  to  the  completion  of  the  project.  The  metric's 
definition  is  framed  so  that  one  program  change  approximates 
one  conceptual  change  to  the  program.  The  following  rules 
♦or  identifying  program  changes  are  reproduced  from 
CDunsmore  78#  pp  •  T9-Z0J: 

"The  following  text  changes  to  a  program  represent  one 
P  r  09  r  am  change: 

1.  One  or  more  changes  to  a  single  statement. 

(Even  multiple  character  changes  to  a 
statement  represent  mental  activity  with  only 
a  single  abstract  instruction.) 

2.  One  or  more  statements  inserted  between  existing 
s  tatementSf 

(The  contiguous  group  of  statements  inserted 
probably  corresponds  to  the  concrete 
statements  that  represent  a  single  abstract 
i nstruc t  ion.  ) 

3.  A  change  to  a  single  statement  followed  by  the 
insertion  of  new  statements. 

(This  instance  probably  represents  a 
discovery  that  an  existing  statement  is 
insufficient  and  that  it  must  be  altered  and 
supplemented  by  additional  statements  to 
implement  the  abstract  instruction 
i nvol ved  •) 


••however,  the  following  text  changes  to  a  program  are 
not  counted  as  program  changes: 
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1.  The  deletion  of  one  or  more  statements* 

(Deleted  statements  must  usually  be  replaced 
by  other  statements  elsewhere*  The  inserted 
statements  are  counted;  counting  deletions  as 
well  would  give  double  weight  to  such  a 
change*  Occasionally  statements  are  deleted 
but  not  replaced;  these  were  probably  being 
used  for  debugging  purposes  and  their 
deletion  requires  litte  mental  activity*) 

2*  The  insertion  of  standard  output  statements  or 

insertion  of  soecial  compiler-provided  debugging 
statements* 

(These  are  occasionally  inserted  in  a 
wholesale  fashion  during  debugging*  When  the 
problem  is  found,  these  are  then  all  removed, 
and  the  necessary  program  change  takes 
place*) 

3*  The  insertion  of  blank  tines,  insertion  of 

comments,  revision  of  comments,  and  reformatting 
without  alteration  of  existing  statements* 

(These  are  all  judged  to  be  cosmetic  in 
nature*)*4 

Program  changes  are  counted  a l go r i t h m i c a  l  t y  by  comparing  the 
source  code  from  each  pair  of  consecutive  compilations  of  a 
module  (or  logical  predecessor  thereof)  and  applying  the 
identification  rules*  Thus  the  total  number  of  program 
changes  is  a  measure  of  the  amount  of  textual  revision  to 
source  code  during  (oostdesign)  system  development. 


The  program  changes  metric  may  be  interpreted  as  a 
orogramming  complexity  measure,  because  textual  revisions 
^e  usually  necessitated  by  errors  encountered  while 
building,  testing,  and  debugging  software*  Independent 
research  CDunsmore  &  Gannon  773  has  demonstrated  a  high 
(rank  order)  correlation  between  total  program  changes  (as 
counted  a u t oma t i c a l 1 y  according  to  a  specific  algorithm)  and 
total  error  occurrences  (as  tabulated  manually  from 
exhaustive  scrutiny  of  source  code  and  test  results)  during 
software  implementation  in  the  SINPL-T  programming  language* 
Thus  empirical  evidence  justifies  cons i de ra t  ion  of  the 
program  changes  metric  as  a  direct  measure  of  the  relative 
numoer  of  programming  errors  encountered  outside  of  design 
work*  It  is  reasonable  to  assume  that  each  textual  revision 
er toils  some  expenditure  of  the  prog ramme r's  effort  (e.g., 
nlinning  the  revision,  editing  source  code  on-line).  In 
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tha t  sense,  thi s  metric  may  also  be  considereo  an  indirect 
meabure  of  the  level  of  human  effort  devoted  to 
imp lementa  t  i on  • 


£*£i22£li£  £22Bi£iil^ 

Control-flow  complexity  may  be  measured  in  terms  of 
cyclomatic  complexity  CMcCabe  7 63*  a  g raph-t heor e t i c  metric 
that  is  independent  of  physical  si2e  (i.e.*  insertion  or 
deletion  of  function  statements  leaves  the  measure 
unchanged)  and  dependent  only  on  the  decision  structure  of  a 
orogram*  The  cyclomatic  number  v(G)  of  a  graph  G  having 
n  nodes*  e  edges*  and  p  connected  components  is 
defined  as 

v(G)  *  e  -  n  ♦  o  • 

In  a  strongly  connected  graph,  the  cyclomatic  number  is 
equal  to  the  minimum  number  of  basis  paths  from  which  all 
other  oaths  may  be  constructed  as  linear  combinations  in  an 
edge-a l geb rai c  fashion  (see  McCabe  763  for  details)*  By 
modeling  the  control  flow  of  a  program  as  a  graph  in  the 
traditional  manner,  the  cyclomatic  complexity  measure  is 
defined  to  be  the  cyclomatic  number  of  the  graph 
cor  re soond  ing  to  the  program's  flow  of  control* 

For  a  structured  language  like  SIMPL-T,  it  is  not 
necessary  to  construct  a  control-flow  graph  in  order  to 
measure  a  program's  cyclomatic  complexity*  The  measure  can 
be  computed  directly  from  the  source  code  simply  by  counting 
the  number  of  oredicates  (i*e*.  Boolean  expressions 
governing  control  flow)*  since  the  predicates  of  the  program 
corresoona  exactly  to  the  b i na ry-b ranch ing  decision  points 
of  the  control-flow  graph*  it  is  easily  shown,  using  a 
lemma  oroven  in  Mills  723,  that  for  a  segment  S  with  it 
predicates  the  segment's  cyclomatic  complexity  is 
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V  (  S)  -  7T  ♦  ^ 

and  for  a  program  P  with  II  predicates  strewn  among  s 
segments  the  program's  cyclomatic  complexity  is 

v(p)  -  n  ♦  s  » 

This  measure  originated  as  an  absolute  count  of  the 
maximum  number  of  linearly  independent  execution  paths 
through  a  segment,  in  the  g raph-t heo re t i c  e dge -a l gep r a i c 
sense  alluded  to  above*  Since  each  of  these  paths  merits 
individual  testing,  the  measure  was  proposed  to  serve  as  a 
Quantitative  indicator  of  the  difficulty  of  testing  a  given 
segment  to  a  certain  degree  of  thoroughness*  Testability  is 
clearly  an  issue  closely  related  to  software  complexity  in 
general,  and  a  program's  cyclomatic  complexity  may  be  viewed 
as  one  quantitative  measure  of  its  control-structure 
complexity* 

D e f in^ii ona [  Variations 

Several  variations  of  the  basic  cyclomatic  complexity 
measure  were  considered,  because  there  are  at  least  two 
definitional  issues  for  which  intuitively  motivated 
alternatives  lead  to  meaningful  variations* 

One  of  these  issues  is  the  weighting  given  to  instances 
of  case  statement  constructs*  The  original  definition  of 
cyclomatic  complexity  views  a  case  statement  as  the 
semantically  equivalent  series  of  nesteo  ifthenelse 
statements:  each  case  statement  contributes  n  units  of 
cyclomatic  complexity,  where  n  is  the  number  of  individual 
"cases"  involved*  it  can  be  argued,  however,  that  a  case 
statement  deserves  a  smaller  contribution  to  cyclomatic 
complexity  since  its  inherent  uniformity  and  readability 
have  a  moderating  effect  on  prog rammer-pe rce i ved  complexity 
(relative  to  an  explicit  series  of  nested  ifthenelse 
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statements).  One  reasonable  alternative  views  each  case 
statement  as  contriouting  Llog^f  n  )J *  units  of  cyclomatic 
complexity*  where  n  is  the  number  of  indiviaual  Mca$esM 
involved.  This  logarithmic  weighting  is  appropriate  since  a 
case  statement's  moderating  effect  seems  to  increase  with 
the  number  of  "cases"  involved. 

The  other  issue  is  the  manner  of  counting  predicates, 
fhe  original  definition  counts  simple  ( sub) p re d i c a t e s 
individually,  so  that  the  compound  predicate 

II  <  J)  ana  (  ( A ( I  )  =  A  ( J ) )  ar  (ngi  SORTED)) 
would  contribute  three  units  of  cyclomatic  complexity!  for 
examole.  An  alternative  definition  considers  each  complete 
predicate  as  an  indivisible  part  of  a  program,  contributing 
one  unit  of  cyclomatic  complexity.  The  motivation  is  that 
the  complete  predicate  represents  a  single  abstract 
condition  governing  the  flow  of  control.  Note  that  this 
issue  is  the  basis  for  a  proposed  extension  Crtyers  77]  to 
the  original  cyclomatic  complexity  measure.  This  issue  also 
affects  the  way  individual  "cases"  of  a  case  statement 
construct  are  identified  and  counted.  The  original 
definition  counts  each  case  label  separately,  since  multiple 
case  laoels  on  the  same  case  branch  are  semantically 
equivalent  to  simple  predicates  joined  by  or *$  to  form  the 
Boolean  expression  governing  the  case  branch.  The 
alternative  cfefintion  counts  only  the  case  branches 
themselves,  regardless  of  case  label  multiplicity.  In 
parallel  with  the  motivation  given  above,  multiple  case 
laoels  on  a  case  branch  represent  a  single  abstract 
condition  governing  that  branch  (e.g.,  the  set  of  case 
laoels  Q,  1,  2*  •••»  2  m*y  be  abstracted  to  diail)* 

This  study  examined  the  four  variations  of  cyclomatic 


*  rne  notation  L  x J  signifies  the  greatest  integer  less 
*hanoreauattox. 
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complexity  defined  as  follows  for  the  SI^PL-T  programming 
language  : 

SIflPPRED-NC ASE  --  Simple  predicates  contribute  1  unit; 
case  statements  contribute  1  unit  for  each  case 
label  • 

SIMPPRED-LOGCASE  --  Simple  predicates  contrioute  1 
unit;  case  statements  contribute  Llog^C  n 
units*  where  n  is  the  number  of  case  labels. 

COMPP 3 ed~N CASE  ~  Compound  predicates  contribute  1 

unit;  case  statements  contribute  1  unit  for  each 
case  branch;  multiple  case  labels  on  the  same  case 
branch  are  disregarded. 

COMPPRED-LOGCASE  --  Compound  predicates  contribute  1 
unit;  case  statements  contribute  L^°92(  n 
units*  where  n  is  the  number  of  case  branches; 
multiple  case  labels  on  the  same  case  branch  are 
di s  reg  a  rded • 

Note  that  the  S I mppred-NC AS E  variation  of  cyclomatic 
complexity  is  HcCabe"s  original  measure. 

l££h n jgugs  fgr  Aggiica^igQ 

There  are  several  ways  to  apply  the  cyclomatic 
complexity  measure  (or  variations  thereof)  to  an  entire 
orogram  in  order  to  obtain  a  metric  for  its  overall  control- 
flow  complexity.  First  of  all*  the  metric  is  defined 
directly  for  a  program  composed  of  individual  segments:  a 
program's  total  cyclomatic  complexity  is  simply  the  sum  of 
its  segments"  cyclomatic  complexities.  However,  this  total 
cyclomatic  complexity  measure  is  not  particularly  useful  as 
a  oasis  for  comparing  entire  programs  because  it  is*  in  a 
certain  sense*  insensitive  to  the  program"s  modularization* 
As  a  metric*  the  total  cyclomatic  complexity  of  a  program  is 
(oy  definition)  a  linear  function  in  two  variables*  tne 
number  of  predicates  ana  the  number  of  segments*  A  subtle 
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trade-off  relationship  exists  between  these  two  variaolest 
such  that  substantial  fluctuation  in  the  metric's  value  can 
arise  from  simpleminded  changes  to  a  program's 
modularization  alone* 

A  better  comparison  of  entire  programs  is  afforded  by 
focusing  attention  upon  the  cyclomatic  complexity  values  of 
individual  segments  and  upon  instances  of  segments  with  high 
values  of  the  metric.  McCabe  originally  proposed  the  number 
13  as  a  reasonaole  thresholo  value  for  a  segment's 
cyclomatic  complexity*  Segments  exceeding  this  threshold 
need  to  be  recoded  or  decomposed  into  smaller  segments  in 
order  to  attain  an  acceptable  level  of  testability  ana 
control-flow  complexity*  Hence*  a  second  way  to  apply  the 
cyclomatic  complexity  measure  to  an  entire  program  is  to 
count  the  number  of  segments  whose  cyclomatic  complexity 
value  exceeds  this  threshold*  In  this  case*  the  basis  for 
comparing  entire  programs  is  the  frequency  of  segments  with 
unacceptably  high  cyclomatic  complexity* 

Finally»  it  would  be  desirable  to  compare  the  full 
spectrum  of  cyclomatic  complexity  values  for  t he  i nd i v i dua  l 
segments  of  one  program  against  that  of  another  programv 
since  consideraole  diversity  often  exists*  Programs 
typically  contain  several  small  segments  with  very  low 
cyclomatic  complexity  values  (e*g*T  a  function  to  compute 
the  average  of  a  vector)  and  a  few  large  segments  witn  high 
cyclomatic  complexity  values*  Being  easily  understood  and 
testeo*  the  small  segments  are  relatively  inconsequen t i a  l  * 
while  the  large  segments  contain  the  substance  of  the 
orogram  and  contribute  most  of  the  consequential  control- 
flow  complexity.  Ideally*  one  wishes  to  disregard  the 
•'smaller11  cyclomatic  complexity  values  and  summarize  the 
magnitude  and  frequency  of  the  "larger"  cyclomatic 
complexity  values  via  a  single  quantitative  indicator  but 
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Ho  so  in  a  flexible,  normalized  fashion,  where  "smaller0  and 
"larger"  are  determined  relative  to  one  another  within  each 
program • 

This  ideal  can  be  aoproximated  b /  means  of  the 
Quantiles  of  the  empirical  distribution  of  cyclomatic 
complexity  values  across  the  segments  comprising  the  program 
(see  Figure  1).  Quantiles  are  a  standard  tool  from 
descriptive  statistics  CConover  71,  pp.  31-32,  pp#  72-733, 
commonly  used  to  summarize  the  "shape"  and  "position"  of  a 
distribution  function,  especially  its  upper  tail  region. 

Both  the  quantile  point  value  (i#e#*  the  largest  integer  * 
such  that  the  fraction  of  cyclomatic  complexity  values  which 
are  less  than  x  is  less  than  or  equal  to  some  fixed 
Traction)  and  the  quantile  tail  average  (i#e#,  the  average 
of  cyclomatic  complexity  values  greater  than  or  equal  to  the 
Quantile  point  value)  are  normalized  way s  to  quantify  just 
how  high  the  cyclomatic  complexity  is  for  the  relatively 
nontrivial  segments  of  a  program#  Several  different 
quantiles  were  examined:  the  0*5  quantile  is  closely  related 
to  the  median  of  the  distribution,  and  the  0*7,  0#5,  and  0.9 
ouantiles  provide  a  series  of  increasingly  smaller  tails  of 
the  distribution.  Thus,  the  basis  for  this  third  comparison 
of  entire  programs  is  a  series  of  quantitative  descriptors 
of  the  empirical  distribution  of  cyclomatic  complexity 
values  within  a  program# 


Dg£a  binding^ 

The  data  bindings  metrics  CStevens,  Myers  &  Constantine 
74;  Vasili  &  Turner  75;  Turner  763  originated  as  a  way  to 
Quantify  a  certain  kind  of  connectivity  (i#e#,  directed 
communication  between  segments  via  global  variables)  within 
a  orogram#  Their  motivation  is  based  on  the  intuitive 

\  , 


FIGURE  1 


figure  1#  F reguenc*  2iii£2feyIi2Q  2l  £*£i222i2£  £22Bl£2iiX 


Beth  the  absolute  and  the  r e l at i ve-cumulat i ve  frequency 
distribution  of  cyclomatic  complexity  values  from  47 
segments  comprising  an  entire  program  are  plotted.  The  tail 
region  associated  with  the  G.8  quantile  is  shaded  on  each 
p  lo  t  • 
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orinciole  that  the  Logical  complexity  of  a  composite  system 
is  a  function  of  the  multiplicity  of  connections  among  its 
component  parts  (cf*  LSimon  693)* 

A  segment-global-segment  da^a  feiQ^IQS  (p*rfq)  is  an 
occurrence  of  the  following  arrangement  in  a  program:  a 
segment  p  modifies  a  global  variaole  r  that  is  also 
accessed  by  a  segment  q  *  with  segment  p  different  from 
segment  q  •  The  existence  of  a  data  binding  (p,r,q) 
suggests  that  the  behavior  of  segment  q  is  probably 
dependent  on  the  performance  of  segment  p  through  the  data 
variaole  r  9  wnose  value  is  set  by  p  and  fetched  by  q  • 
The  binding  (p*r*q)  is  different  from  the  binding 
(q*rfp)  which  may  also  exist;  occurrences  such  as 
(o*r#p)  are  not  counted  as  data  bindings*  Thus  each  data 
binding  represents  a  unique  communication  path  between  a 
pair  of  segments  via  a  global  variable*  As  a  metric*  the 
total  number  of  segmen t -g l ooa t-s egment  data  bindings 
reflects  the  Degree  of  that  kind  of  connectivity  within  a 
prog  ram • 

Data  bindings  may  be  counted  in  three  different  ways: 
actual*  possible*  and  relative  percentage*  ( Bear  in  mind 

that*  since  these  measures  are  determined  statically  from 
the  source  code*  the  terms  "'actual*'  and  "'possible*'  refer  to 
a  orogram^s  syntactic  form  only.) 

First*  the  count  is  the  absolute  number  of  data 

bindings  (p*r*q)  actually  coded  in  the  program:  segment 
p  contains  a  statement  modifying  global  variable  r  *  and 
segment  q  contains  a  statement  accessing  r  •  This  count 
of  actual  data  bindings  represents  the  degree  of  realized 
connectivity  in  the  program*  Second*  the  count  is 

the  absolute  number  of  data  bindings  (p*r*q)  that  could 
possibly  be  allowed  under  the  program's  structure  of  segment 
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definitions  and  global  variable  declarations:  the  scope  of 
glooal  variable  r  merely  contains  both  segment  p  and 
segment  q  f  so  that  segment  o  could  potentially  moaify 
r  and  segment  q  could  potentially  access  r  •  This 
count  of  possible  data  bindings  represents  the  degree  of 
potential  connec t i vi t y ,  in  a  Mwors t-case”  sense.  It  is 
computed  as  the  sum  of  terms  s*(s-1)  for  each  global 
variable,  where  s  is  the  number  of  segments  in  that 
glooal's  scope;  thus»  it  is  heavily  influenced  (numerically 
speaking)  by  the  sheer  number  of  segments  in  a  program. 

■lira,  the  rg^atlyg  £££££Qi£2£  is  a  way  of  normalizing  the 
aosolute  numbers  of  data  bindings,  since  it  is  simply  the 
quotient  (expressed  as  a  percentage)  of  actual  data  bindings 
divided  by  possible  data  bindings.  It  represents  the  degree 
of  realized  connectivity  relative  to  potential  c onnec t i v i t y • 

Actual  data  bindings  may  also  be  sub cha ra c t e r i zed  on 
the  basis  of  the  invocation  relationship  between  the  two 
segments.  A  data  binding  (pvr9q)  is  SUfef ynggignji  if 
either  of  the  two  segments  p  or  q  can  invoke  the  other, 
whether  directly  or  indirectly  (via  a  chain  of  intermediate 
invocations  involving  other  segments).  In  this  situation, 
the  functioning  of  the  one  segment  may  be  viewed  as 
contributing  to  the  overall  functioning  of  the  other 
segment.  A  data  binding  (p,r,q)  is  indgfcgnggni  if  neither 
of  the  two  segments  p  or  q  can  invoke  the  other,  whether 
directly  or  indirectly*  The  transitive  closure  of  the  call 
graph  among  the  segments  of  a  program  is  employed  to  make 
this  aistinction  between  suo func t i onal  and  independent  data 
bindings. 

In  some  sense,  all  three  measures  dealing  with  segment- 
gioba l-segment  data  bindings — actual,  possible,  and  relative 
percent  age— ref  l  ect  quantifiable  c  ha  rac  te  r  i  s  t  i  cs  of  a 
program's  Mdata  modu* arizat ion"  (i.e*,  the  static 
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orjdni/dt  ion  of  data  definitions  anu  references  within 
segments  and  modules)* 

In  particular*  the  possible  data  bindings  metric 
reflects  the  general  degree  of  encapsulation  enforced  by  the 
i mo l erne nt a t i on  language  for  global  variables.  One  can 
imagine  two  extremes  of  encapsulat ion  for  the  same 
collection  of  global  variables  and  segments*  On  the  one 
hand*  the  program  could  be  written  (in  the  implementation 
language  SIMPL-T)  as  a  single  module  containing  all  the 
segments*  with  each  global  potentially  accessible  from  every 
segment.  This  modular i zat ion  would  maximize  (explosively 
so,  due  to  the  squaring  of  the  number  of  segments)  the 
number  of  possible  data  bindings.  On  the  other  hand*  the 
program  could  be  written  (in  the  implementation  language 
Sl*P l-T)  as  several  modules,  one  for  each  segment*  with 
appropriate  ENTRY  and  EXTernal  declarations  to  provide  each 
segment  with  potential  access  to  exactly  those  globals  it 
actually  uses.  This  modularization  would  minimize  the 
number  of  possible  data  bindings  (to  precisely  the  number  of 
actual  data  bindings). 

Moreover*  the  data  bindings  relative  percentage  metric 
also  reflects  the  general  degree  of  (operational) 

'•globality"  for  the  global  variables  declared  in  a  program* 
i.e.,  the  extent  to  which  globals  are  actually  modified 
(set)  and  accessed  (fetched)  py  those  pairs  of  segments  that 
could  possibly  do  so*  One  can  imagine  two  different 
situations  in  which  the  relative  percentage  of  data  bindings 
for  a  small  set  of  otherwise  equivalent  global  variables 
(say*  an  array  and  an  integer)  would  be  extremely  high  and 
extremely  low*  respectively*  On  the  one  hand*  this  global 
array  and  global  integer  could  be  serving  as  a  stack*  and 
nearly  every  segment  that  refers  to  these  globals  could  be 
botn  popping  the  stack  to  examine  Its  contents  and  pushing 
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new  items  onto  it.  Here  the  two  global  variables  are  quite 
central  to  the  overall  operation  of  that  collection  of 
segments;  their  data  binding  relative  percentage  woulu  be 
close  to  one*  On  the  other  hand*  this  global  array  and 
global  integer  could  be  serving  as  a  buffer  for  a  varying- 
length  vector  that  is  initially  produced  (set)  by  one 
segment  and  nondest rue t ive 1 y  consumed  (fetched  only)  Dy 
several  other  segments*  Here  the  two  global  variables  are 
rather  incidental  to  the  overall  operation  of  that 
collection  of  segments  ,  serving  merely  as  a  convenient 
medium  tor  disseminating  information  (which  could  also  have 
achieved  via  parameter  passing);  their  data  binding  relative 
percentage  would  be  close  to  zero* 


2gf t wa r£  Sciencg  Quantities 

The  software  science  quantities  are  a  set  of  metrics 
based  upon  the  tenets  of  software  science  CHalstead  773,  as 
pioneered  by  Halstead  and  his  colleagues*  Billed  as 

*'•*•  a  branch  of  experimental  and  theoretical  science 
dealing  with  the  human  preparation  of  computer  programs 
and  other  types  of  written  material  •••*”* 
software  science  is  concerned  with  measurable  attributes  of 
algorithms  or  programs  and  with  mathematical  rela t ionsh ips 
among  those  attrioutes*  Software  science  is 
cha ra c t er i st i ca l  l y  actuarial  in  nature:  its  measures  and 
re l a t i ons h  ips  may  be  inaccurate  when  applied  to  individual 
programs,  but  they  become  surprisingly  more  accurate  when 
applied  to  large  numbers  of  programs*  such  as  are  found  in 
large  software  development  projects* 

The  software  science  quantities  are  all  defined  in 


*  The  blocked  quotations  throughout  this  section  are  taken 

*  r  o  m  iHdlsteaa  7  7  j  * 
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terms  of  certain  frequencies  of  so-called  “operators'*  and 
"ooeranjs"  aopearing  within  an  algorithm's  functional 
specification  or  a  program's  source  code  implementation. 

Some  of  the  quantities  (e.g*,  vocabularyv  length,  volume) 
are  purely  descriptive  and  provide  the  building-blocks  of 
the  theory.  A  few  <e*g.,  estimated  length)  are  predictive 
of  other  descriptive  quantities  within  the  theory.  Several 
(e.g.*  program  level*  language  level*  effort)  claim  to  be 
quantifications  of  fundamentally  qualitative  and  intuitive 
concepts.  Still  other  quantities  (e.g.*  estimated  time* 
estimated  bugs)  purport  to  measure  — under  ideal  conditions  — 
externally  observable  and  quantifiable  programming 
phenomena  • 

idgnt  jli£2li2Q  £  £2  te  ri^a 

The  criteria  for  identifying  "operators'*  and  "operands'1 
(and  their  uniqueness)  are  important  since  they  are  the 
fojndation  for  measuring  the  software  science  quantities* 
However,  this  identification  is  an  area  not  clearly 
addressed  by  the  theory.  Except  for  the  Fortran  programming 
language,  in  which  most  of  the  pioneering  work  was  done, 
explicit  standards  or  guidelines  for  i den t i f  ic at i on  do  not 
exist.  For  another  language*  a  researcher  can  only  attempt 
to  adapt  and  extend  the  principles  that  he  personally  judges 
to  oe  behind  the  Fortran  work*  The  following  "operator/ 
operand"  i den t i f i c a t i on  criteria  were  designed  for  the 
SI^Pl-T  programming  language: 

1*  In  general*  only  the  portion  of  source  code  pertaining 
to  executable  statements  (after  expansion  of  all 
D E F I N E~mac r os)  is  considered* 

2.  Constants  and  data  variable  identifiers  are  natural 

"operands."  Data  structures  (e*g**  arrays,  files)  are 
considered  single  objects  and  not  decomoosed  into 
components* 
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3*  The  input  stream  tile  and  the  output  stream  file  are 
counted  as  "operands,"  with  implicit  occurrences 
recognized  for  each  operation  on  these  files# 

4.  The  normal  prefix  unary  and  infix  binary  operator 

symbols  (i#e#,  for  arithmetic,  logical,  and  character- 
string  operations)  are  natural  "operators#" 

5#  The  intrinsic  procedures  (e#g.,  READ,  WRITE,  REWIND), 
t  ype*c©erc  i  on  functions*  and  inpufoutput  operation 
keywords  (e#g#f  EJECT,  SKIP)  are  “operators." 

6#  Segment  invocations  (i.e.,  procedure  calls  and  function 
references)  are  "operators." 

7.  different  types  of  statements  or  constructs  are 

considered  individual  "operators, "  as  follows: 

:=  (assignment) 

IF.. . THEN. . .END 

IF. ..THEN.. .ELSE.  ..END 

CASE.. .OF. ..END 

CASE.. .OF. ..ELSE.  ..END 

WHILE.  ..DO. ..END 

EXIT 

RETURN 

8.  Other  delimiter  patterns  are  considered  "operators,"  as 

follows : 

\...\  (caselabel  designation  "operator") 

[...]  (partword  and  substring  "operators") 

(...)  ( suoe xp r es s i on ,  array  subscript, 

actual  argument  list,  and  function 
return  value  "operators”) 

,  (list  item  separation  "operator") 

9.  Finally,  implicit  statement  list  brackets  (associated 

with  pairs  of  keywords  such  as  THEN...ELSE  and 
ELSE. ..END)  are  considered  "operators,"  as  are  implicit 
statement  separators  between  consecutive  statements  of 
the  same  statement  list. 

(The  Quotation  marks  flagging  "operator"  and  "operand"  as 

i 
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technical  terms  are  suppressed  throughout  the  remainder  of 
this  section  for  readab i l i t y  •) 

d a Eaiassifiis 

The  five  basic  parameters  of  software  science  are 
determined  in  accordance  with  the  criteria  established  for 
identfying  operators  and  operands* 

The  theory  defines  four  basic  parameters  pertaining  to 
a  orogram's  i mp l emen t a t io n :  the  total  operator  count  * 

the  total  operand  count  N  ^  *  the  unique  operator  count 

v  and  the  unique  operand  count  ^  •  The  total  counts 

include  all  occurrences  of  operators/operands,  while  the 
unique  counts  disregard  multiple  occurrences  of  the  same 
ope ra tor/operand •  Although  the  issue  of  synonymy  for 
operators  has  already  been  dealt  with  in  the  i dent i f i ca t ion 
criteria*  issues  of  synonymy  for  operands  still  remain*  In 
particular*  formal  parameters  are  considered  to  be 
synonymous  with  cor re spond i ng  actual  arguments;  therefore 
occurrences  of  formal  parameter  identifiers  contribute  to 
the  total  operand  count  but  not  to  the  unique  operand  count, 
with  respect  to  the  entire  program*  This  rule  is  not, 
however*  applied  in  the  case  of  formal  parameters  passed  by 
value  and  modified  by  the  segment;  because  these  are 
actually  treated  as  special  ini t i a l i zed-upon -ent ry  local 
variables  in  the  implementation  language  SIMPL-T*  they  are 
not  considered  to  be  synonymous  operands  with  respect  to  the 
ent i re  program  • 

The  theory  defines  four  additional  basic  parameters* 
analogous  to  those  described  above*  but  pertaining  to  an 
algorithm's  or  program's  "shortest  possible  or  most  succinct 
f  or  m,#  (i*e**  its  “one  - 1  ine  r  "  functional  specification, 
conceived  as  an  assignment  statement  or  procedure  call 
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iny/ol  vinQ  a  single  Ouilt-in  routine)*  These  are  the  total 
potential  operator  count  *  the  total  potential  operand 

cojnt  N ^ *  »  the  unique  ootentia l  operator  count  n^*  *  and 
the  unique  potential  operand  count  tj  *  •  (The  modifier 
"potential"  and  the  superscripted  asterisk  distinguish 
Quantities  pertaining  to  the  functional  specification  from 
analogous  quantities  pertaining  to  the  i mp l e me n t a t i on # )  The 
theory  assumesi  however,  that  the  total  potential  operator/ 
operand  counts  must  always  eaual  the  unique  potential 
ope ra tor/ooerana  counts,  because  the  most  succinct 
specification  would  contain  no  redundant  occurrences  of 
operators/operands*  An  assumption  is  also  maoe  that  the 
unique  potential  operator  count  is  always  equal  to  the 
constant  2,  because 

•*•••  the  minimum  possible  number  of  operators  »»#  must 

consist  of  one  distinct  operator  for  the  name  of  the 

function  or  procedure  and  another  to  serve  as  an 

assignment  or  grouping  symbol*11 

Thus*  all  software  science  quantities  pertaining  to  a 
program's  specification  are  completely  determined 
^numerically  speaking)  by  a  single  parameter*  the  unique 
potential  operand  count  r^*  •  It  is  the  fifth  basic 
parameter  of  software  science  theory  and  is  conceptually 
equivalent  to  the  number  of  "logically  distinct  inout/output 
parameters"  for  an  algorithm  or  program.  This  count  holds 
considerable  significance  in  both  the  theory  and  its 
application*  but  unfortunately  it  is  rather  intractable  for 
most  nontrivial  programs  (i.e**  those  whose  specifications 
are  not  easily  stated  as  "one-l i ne rs"  without  gross 
oversimplification).  For  example,  some  logically  distinct 
inout  parameters  may  appear  as  soecial  constants  embedded 
within  the  code*  ana  the  number  of  logically  distinct  output 
parameters  represented  within  a  printed  report  is  often 
unc  l  e  a  r  . 
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An  alternative  and  more  tractable  conceptualization 
defines  this  uni oue  potential  operand  count  simply  as  the 
nunuer  of  distinct  operands  busy-on-entry  (i.e*,  initially 
containing  a  value  that  is  utilized  or  accessed  by  the 
algorithm  or  program)  plus  the  number  of  distinct  operands 
busy-on-exit  (i*e.»  finally  containing  a  value  that  was 
furnished  by  the  algorithm  or  orogram  to  be  utilized  or 
accessea  suosequ ent ly ) .  For  an  individual  segment,  r^* 
may  be  estimated  from  the  implementation  by  counting  all  of 
the  global  variables  that  are  referenced,  each  of  the  formal 
parameters,  one  for  both  the  input  stream  file  and  the 
output  stream  file  (if  they  are  read  or  written),  and  one 
for  the  function  return  value  (if  the  segment  is  a 
function)*  it  should  be  noted  that  this  estimate  is  a  tower 
bojnd  since  it  disregards  the  possibility  that  a  formal 
parameter  which  is  passed  by  reference  should  be  counted 
twice  because  it  is  both  busy-on-entry  and  busy-on-exit* 

For  an  entire  program,  n2*  ma *  estimated  from  the 
implementation  by  counting  one  for  both  the  imput  stream 
file  and  the  output  stream  file  if  the  program  reads  or 
writes  them,  plus  one  for  the  set  of  control  bits  or  option 
letters  that  might  oe  used  to  regulate  the  program's 
execution. 

Thus,  for  the  programs  examined  in  this  study,  the 
estimated  number  of  unique  potential  operands  is  always 
either  Z  or  3,  depending  on  whether 

outpu t_l is t  ing  :  =  camp i le^and^execut e(  input_deck  ) 
or 

output_listing  :  - 

compi le_and_execute(  input_deck»  opt i on_let t e r s  > 
states  their  functional  specification.  It  is  clear  that 
this  kind  of  estimation  of  r^*  from  the  i m o l ement a t i on  is 
considerably  more  accurate  for  individual  segments  than  for 
entire  programs;  this  fact  is  partial  motivation  for  the 
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particular  estimation  technique  employee.  the  second 

method  of  calculation  discussed  below. 

Pro  ££ r£j  e s 

The  derived  properties  of  software  science  are  defined 
in  terms  of  the  five  basic  parameters. 

The  v  2£  a  b  u  l_  a  f|  is  defined  as 


ind  reoresents  the  cardinality  of  the  set  of  logically 
distinct  "symbols"  used  to  implement  the  program.  The 
i£D2l!2  N  fS  defined  as 

N  =  N  ♦  N 
1  2 

and  represents  the  abstract  size  of  the  prog  ram's 
implementation  as  measured  in  units  of  logically  distinct 
"symbols."  This  property  is  closely  associated  with  the 
nutioer  of  syntactic  tokens  in  the  source  code  of  a  program; 
it  can  be  considered  a  refinement  of  the  rudimentary  TOKENS 
isoect.  The  estimated  length  ft  is  defined  as 

S  -  *  log7(n1>>  ♦  <n2  *  io32(n2>)  * 

reflecting  one  of  the  fundamental  conjectures  of  the  theory; 
nanely ,  that  the  observed  length  of  a  program's 
implementation  is  a  function  solely  of  the  number  of  unique 
operators/operands  involved. 

Considerable  empirical  evidence  has  supported  the 
validity  of  tnis  "length  prediction  equation"  on  £hg  average 
'i.e.f  major  software  studies  have  reported  correlation 
coefficients  of  between  3*95  and  0.99  for  the  relationship 
bet  *een  K  and  ft  CFitzsimmons  &  Love  783)*  However*  its 
accuracy  for  a alven  program  may  be  low;  the  theory 
attributes  this  to  the  presence  of  so-called  "impurities" 
indicating  a  lack  of  polish  in  a  program.  These  impurities 
incluae  instances  of  unnecessary  redundancy  and  needless 
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constructions*  such  as  inverse  operations  that  cancel  each 
other,  common  subexpressions,  or  unreachable  statements* 

This  has  led  some  researchers  to  view  the  discrepancy 
between  N  and  f)  as  a  possible  software  quality  measure* 
For  these  reasons »  it  was  desirable  to  examine  the 
I  f  f  EP  ENC  E  (N  )  aspect,  calculated  as 
(  I  $  -  N I )  /  (N>  , 

which  normalizes  the  decree  of  discrepancy* 

The  voiynjg  V  is  defined  as 
V  =  N  *  log2  in ) 

and  represents  the  abstract  size  of  the  program's 

imo lementa tion  as  measured  in  units  of  i n f or ma t i on- theo r e t  i  c 

bits*  Spe c  if i ca  l  ly ,  it  is  the  minimum  number  of  bits 

required  to  encode  the  implementation  as  a  sequence  of 

fixed-width  binary  strings  (since  it  is  the  product  of  the 

total  number  of  "symbols”  and  the  minimum  bandwidth  required 

to  distinguish  each  of  the  unique  "symbols").  The  22l£Qliti 

XSiyif  v*  is  defined  analogously  as 

V*  =  N*  *  log  <n*) 

2 

=  (N1*  ♦  N2*)  *  tog2(n1*  +  n2*> 

=  in,*  +  n2*>  *  log2  +  n2*> 

*  <2  ♦  n2*>  *  1 03 2 ( 2  ♦  n2*>  * 

The  potential  volume  of  any  algorithm  or  program  is 
theoretically  independent  of  any  language  in  which  it  might 
be  implemented;  thus, 

"°rovided  that  n2*  is  evaluated  as  the  number  of 
conceptually  unique  operands  involved,  V*  appears  to 
be  a  most  useful  measure  of  an  algorithm's  content." 

The  e£23£§a  level  l  is  defined  as 
L  *  V*  /  V 

ani,  as  a  ratio  of  volumes,  can  only  take  on  values  between 
r*ro  ana  one;  it  quantities  the  intuitive  concept  of  "level 
-*  abstraction"  for  an  imp l emen t a t i on.  Since  the  potential 
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volume  of  any  given  algorithm  is  constant  ,  the  formula 
indicates  an  inverse  relationship  (as  desired  intuitively) 
between  level  of  abstraction  (measured  by  L  )  and  size 
(measured  by  V  ).  The  theory  also  attaches  meaning  to  the 
reciprocal  of  program  level?  defining  0  as 

D  =  1  /  L 

which  may  a  1 1  e  r  n  a  t  i  ve  l  y  be  viewed  a  the  amount  of 
redundancy  within  an  i mpl em ent at ionc 

Unfortunate  ly,  this  definition  of  program  level  is  not 
particularly  useful  since  it  is  difficult  (as  discussed 
aoove)  to  determine  exactly  a  program's  unique  potential 
operand  count  n2*  or  its  potential  volume  V*  •  Desiring 
to  be  able  to  measure  program  level  even  if  these  quantities 
were  unavailable*  Halstead  conjectured  that  an  £siimateg 
B£2S££!2  C  t  defined  as 

C  -  (n1*  /  j)  )  *  (n2  !  n25 

-  (2  /  )  *  (  /  N  )  * 

could  be  measured  directly  from  an  implementation  alone* 
without  a  specification  or  the  unique  potential  operand 
count  n2*  »  With  only  limited  evidence  supporting  the 
validity  of  this  estimate*  the  theory  makes  the  qualified 
claim  that 

”•*•  for  many  purposes  L  and  C  may  be  used 
inte rchangeably  to  specify  the  level  at  which  a  program 
has  oeen  implemented*  at  least  for  smaller  orograms*" 
Although  most  software  science  studies  (e*g.*  CElshoft  76; 
Love  &  Bowman  It;  Curtis  et  al.  793)  have  had  no  choice  but 
to  rely  upon  this  "program  level  prediction  equation*1  to 
calculate  the  derived  properties*  the  program  level 
estimator  C  has  been  criticized  publically  COldehoeft  773 
for  unsa t i sf a c t o ry  behavior  under  certain  conditions.  The 
questionable  validity  of  this  prediction  equation  is  the 
principal  motivation  for  considering  the  two  alternative 
metnous  of  calculation  discussed  below. 


CHAPTER  V 


In  any  event*  the  theory  employs  C  internally* 
defining  tQQltQt  1  as 

I  -  C  *  v 

and  oroposing  it  as  a  measure  of  "how  much  is  said'*  in  an 
algorithm  or  program  (  i  *e  •  *  its  information  content).  This 
intelligence  content  quantity  represents  the  amount  of 
detail  expressed  in  an  im p l emen t a t i on  but  weighted  by  its 
level  of  expression.  By  definition*  I  is  determinaote 
from  an  i m p le me n t a i on  alone.  If  there  is  a  strong 
relationship  between  L  and  t  *  intelligence  content  I 
would  be  appro x i ma t e l y  equal  to  potential  volume  V*  •  In 
fact*  Halstead  originally  demonstrated  that  the  removal  of 
program  "impurities”  (as  described  above)  consistently 
imoroved  the  numerical  agreement  between  V*  and  I  » 
Normalizing  the  degree  of  discrepancy  between  these  two 
quantities*  the  ^DIFFERENCE (  V  *f  I )  aspect*  calculated  as 
(  I  I  -  V*  I )  /  (v*>  , 

may  be  interpreted  as  another  possible  software  quality 
measure*  according  to  the  theory. 

The  language  X  is  defined  as 

X  -  L  *  V* 

•  L  *  L  *  V 
=  V*  *  V*  /V 

and  claims  to  quantify  the  popular  intuitive  concept  known 
by  the  same  name.  The  theory  suggests  that  X  should 
remain  relatively  constant  for  any  particular  implementation 
language  while  the  implemented  algorithm  itself  is  allowed 
to  vary.  Empirical  evidence  from  a  carefully  constructed 
set  of  programs*  each  implemented  in  several  common 
programming  languages*  indicated  that  the  ordering  of  mean 
values  for  X  (which  ranged  from  about  0.8  for  assemoly 
language  to  about  1.6  for  Pt/1)  concurred  exactly  with  the 
generally  accepted  intuitive  ordering  of  the  languages 
t hemse Ives. 


?1 


Chapter  v 


The  effort  E  is  defined  as 

c  =  V  *  0 

*  V  /  L 

=  V  *  V  /  v r 

but  this  quantity  does  not  purport  to  measure  development 
effort  in  the  usual  sense*  Rather*  the  theory  originally 
restricted 

"•*.  the  concept  of  programming  effort  to  be  the  mental 
activity  required  to  reduce  a  preconceived  algorithm  to 
an  actual  implementation  in  a  language  in  which  the 
implementor  (writer)  is  fluent  •  •  •  " 
according  to  further  elaooration  of  the  theory  [Gordon  793 f 
this  property  represents  the  effort  required  (under  iaeal 
conditions^  to  comprehend  an  i mp l e me nt a t i on  rather  than  to 
produce  it;  E  may  tnus  be  interpreted  as  a  measure  of 
program  clarity*  The  effort  property  is  considered  to  have 
the  dimension  either  of  bits  or  of  "elementary  mental 
di s c r i m ina t ions • M  Borrowing  from  research  in  psychology* 
the  theory  converts  this  amount  of  mental  effort  into  an 
externally  observable  duration  of  time*  defining  the 
estimated  £22!?  7  as 

T  *  E  /  S 

where  S  is  the  so-called  Stroud  rate*  i*e**  the  numoer  of 
"elementary  mental  discriminations"  made  by  a  programmer 
( c omp r ehender )  per  second*  Psychologists  had  shown  that 
5  <  S  <  20  and  Halstead  determined  empirically  that  S  =  IS 
was  a  reasonable  value* 

finally*  the  theory  purports  to  quantify  one  other 
externally  observable  property*  namely*  the  total  number  of 
"aelivered"  bugs  in  an  implementation*  The  fiiliEStfid 
5  property  is  defined  as 
S  s  L  *  E  /  E 

o 

•  V  /  E 

o 

where  E  is  defined  as  "the  mean  number  of  elementary 
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cental  discriminations  between  potential  errors  in 

prog rammi ng.M  The  theory  argues  that  E  =  3000  is  a 

o 

reasonable  value.  This  number  of  bugs  may  be  interpreted  as 
either  the  expected  number  of  errors  remaining  in  a 
delivered  program  or  the  number  of  errors  observed  during 
program  testing;  both  i nt e r p re ta t i ons  have  received  some 
emo i r i c  a l  suopor  t . 

i£l22Q  2££h 22 £ 

Because  the  validity  of  the  "program  level  prediction 
equation"  is  suspect  (as  discussed  above),  this  study 
employed  two  different  methods  for  calculating  software 
science  quantities:  one  relies  directly  upon  this  estimate, 
the  other  does  not. 

uoth  methods  calculate  exact  values  for  some  derived 
properties  via  their  defining  formulas  directly  from  the 
imo lement a tion^s  basic  parameters#  The  methods  are 
therefore  identical  with  regard  to  the  following  measured 
asoects:  VOCABULARY,  LENGTH,  ESTIMATED  LENGTH, 

"D1 FFE^ENCElNtfi) t  VOLUME,  INTELLIGENCE  CONTENT,  and 
ESTIMATED  BUGS*  But,  because  reasonable  values  for  unique 
potential  operand  counts  are  generally  unavailable  (from 
either  the  specification  or  the  implementation)  for  programs 
of  the  size  considered  in  this  study,  both  methods  of 
calculation  can  only  approximate  the  remaining  derived 
properties  by  relying  upon  various  estimates#  Due  to  the 
intrinsically  high  degree  of  i nt er re  la t ionsh  ip  among  the 
software  science  quantities,  it  generally  suffices  to 
approximate  just  one  additional  derived  property  via  some 
estimation  technique;  the  remaining  derived  properties  can 
then  alt  be  approximated  in  turn  via  their  defining  formulas 
from  the  known  exact  values  plus  the  estimated  value#  The 
methods  therefore  differ  in  their  choice  of  quantity  to  be 
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est itdted»  in  their  estimation  technique*  and  with  regard  to 
the  following  measured  aspects:  PROGRAM  LEVEL,  DIFFICULTY, 
dDTE*'TIAl  VOLUME,  TDIFFERENCE(V*,I  )  ,  LANGUAGE  level,  effort, 
and  E$TIMATED  TIME. 

The  first  method  relies  upon  a  "t h eo re t i c a  l  M  estimation 
of  the  program  level  quantity*  The  estimated  program  level 
L  is  calculated  directly  from  the  program's  implementation 
via  its  defining  formula  and  then  substituted  as  an 
apDronmation  for  the  (true)  program  Level  L  •  Under  this 
method  the  exact  value  for  intelligence  content  1  is,  by 
definition,  always  equal  to  the  approximate  value  for 
ootential  volume  V*  ;  hence  it  is  pointless  to  examine  the 
?DIfFE&ENCE(V*,I)  aspect  under  this  method  of  calculation* 

The  second  method  relies  upon  an  #,empi  r  ica  l M  estimation 
of  the  language  level  quantity*  A  program's  language  level 
is  approximated  as  the  mean  value  of  estimates  for  the 
language  levels  of  the  segments  comprising  the  program.  An 
estimate  of  each  segment's  language  level  can  be  calculated 
directly  from  the  implementation  (via  the  defining  formulas 
for  X  ,  V*  ,  and  V  ),  using  an  estimate  of  the  segment's 
unique  potential  operand  count  r)^*  in  addition  to  the 
exact  values  of  the  segment's  other  basic  parameters  , 

N2  *  *  aru*  ^2  *  The  un*Que  potential  operand  estimate 

is  obtained  by  counting  operands  that  are  busyon-entry  or 
tusy-on-exit  (as  discussed  above);  this  technique  seems 
Quite  reasonable  when  applied  to  segments,  most  of  which  are 
small  enough*  Use  of  the  mean  estimate  for  X  across  the 
individual  segments  of  an  entire  program  was  inspired  by  the 
experimental  treatment  of  language  level  given  in  Halstead's 
book.  Under  this  method  of  calculation,  all  of  the  derived 
properties  defined  above  are  distinct  and  nonredundan t l y 
calculated* 
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This  chapter  describes  the  steps  talen  to  guide  the 
olanning,  execution,  and  analysis  of  the  experimental 
investigation  reported  in  this  dissertation.  The 
investigative  methodology  outlined  here  was  devised  as  a 
vehicle  for  research  in  software  engineering.  It  relies 
upon  established  principles  and  techniques  for  scientific 
research:  empirical  study*  controlled  e xpe r i me nt a t i on ,  and 
statistical  analysis. 

The  central  feature  of  the  investigative  methodology  is 
a  11  differential!  on-among-  gro up s-by -aspects"  paradigm.  The 
research  goal  is  to  answer  the  question*  what  differences 
exist  among  the  treatment  groups  (which  represent  different 
programming  en v i ronme n t s)  as  indicated  by  differences  on 
measured  aspects  (which  reflect  quantitative  characteristics 
of  software  phenomena)?  This  use  of  "difference 
discrimination"  as  the  analytical  technique  dictates  a 
statistical  model  of  homogeneity  hypothesis  testing  that 
influences  nearly  every  element  of  the  investigative 
methodology. 

Other  analytical  techniques  could  have  been  employed: 

estimation  of  the  magnitude  of  differences  between 
experimental  treatments* 

correlations  between  measured  aspects  across  all 
experimental  treatments* 

multivariate  analysis  (rather  than  multiple  univariate 
analyses  in  parallel*  as  is  the  case  here)*  or 

factor  anatysis  (breakdown  of  variance  in  one  aspect 
among  the  other  measured  aspects)* 

to  name  a  few  examples.  These  are  useful  techniques  and  may 
be  used  at  a  later  time  to  answer  other  research  questions. 
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For  tne  present  i  nve s 1 i 33 t  i  on ,  difference  d i sc r i m i na t i on  was 
chosen  as  a  reasonable  ••first-cut"  probe  of  the  empirical 
data  collectea  for  the  research  project;  by  taking  this 
conservative  approach,  information  may  be  obtained  to  help 
guide  more  refined  probes  in  the  future. 

Although  the  methodology  is  built  around  running  an 
experiment,  collecting  data,  ana  making  statistical  tests, 
these  activities  (i.e#f  the  execution  phase)  play  a  small 
role  within  the  overall  investigative  methodology,  in 
comparison  to  the  planning  and  analysis  phases.  This  is 
readily  apparent  from  the  schematic  in  Figure  2,  whicn 
charts  some  of  the  re  la ti on  ships  among  the  various  elements 
for  steps)  of  the  investigative  methodology.  Another 
feature  of  the  investigative  methodology  is  the  careful 
distinction  made  during  the  analysis  phase  between  objective 
results  (the  empirical  scores  for  the  metrics  and  the 
statistical  conclusions  they  infer)  and  subjective  results 
(interpretations  of  the  objective  results  in  light  of 
intuition,  research  goals,  etc.). 

The  remainder  of  this  chapter  outlines  the  overall 
method  by  defining  each  step  and  discussing  how  it  was 
apolied.  Further  details  of  certain  steps  are  given  within 
other  chapters  of  the  dissertation,  as  follows: 

Ste p  5  Research  Frameworks  Chapter  VIII 

Step  6  Experimental  Design  Chapter  III 

Step  7  Collected  Data  Chapter  III 

Step  10  Statistical  Conclusions  Chapter  VII 

Step  11  Research  Inter pretat ions  Chapter  VIII 

Step  1:  OyfiSliCQS  £ 1  12I£££SI 

Several  questions  of  interest  were  initiated  and 
refined  so  that  answers  might  be  given  in  the  form  of 
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statistical  conclusions  and  research  interoretat ions. 
Questions  were  formulated  on  the  basis  of  several  concerns: 

(1)  software  development  rather  than  software  maintenance, 

(2)  a  desire  to  assess  the  ef f ec ti veness  of  disciplined  team 
programming,  in  comparison  to  undisciplined  team  programming 
and  individual  programming,  (3)  Quantitatively  measurable 
asoects  of  the  process  and  product,  and  (4)  the  analytical 
tecnnique  of  difference  discrimination#  The  questions  of 
interest  took  the  final  form,  "During  software  development, 
what  comparisons  between  the  effects  of  the  three 
programming  env i ronme n t s , 

(a)  individual  programming  under  an  ad  hoc  approach? 

(b)  team  programming  under  an  ad  hoc  approach? 

(c)  team  programming  under  a  disciplined  methodology, 
apoear  as  differences  in  quantitatively  measurable  aspects 
of  the  software  development  process  and  product? 

Furthermore,  what  kind  of  differences  are  exhibited  and  what 
is  the  airection  of  these  differences?" 

Step  2:  Research  H^go t h es £ s 

Since  the  investigative  methodology  involves  hypothesis 
testing,  it  is  necessary  to  have  fairly  precise  statements? 
callea  research  hypotheses?  which  are  to  be  either  supported 
or  refuted  by  the  evidence#  The  second  step  in  the  method 
was  to  formulate  these  research  hypotheses,  disjoint  pairs 
designated  null  and  alternative?  from  the  questions  of 
interest* 

A  precise  meaning  was  given  to  the  notion  of 
"ai f f er ence*M  The  inve st igation  considered  both  (a) 
differences  in  central  tendency  or  average  value?  and  (b) 
differences  in  variability  around  the  central  tendency?  of 
observed  values  of  the  quantifiable  programming  aspects#  It 
should  be  noted  that  this  decision  to  examine  both  location 
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and  Jispersion  comparisons  among  the  experimental  groups 
brought  a  pervasive  duality  to  the  entire  investigation 
(i.e.,  two  sets  of  statistical  tests,  two  sets  of 
statistical  results,  two  sets  of  conclusions,  etc*--always 
in  parallel  and  independent  of  each  other),  since  it 
addresses  both  the  £*C£C£an£x  and  the  grgd i £  tab i i i t *  of 
behavior  under  the  experimental  treatments* 

Some  vagueness  was  removed  regarding  the  sire  of  the 
particular  programming  task  by  taking  explicit  the  implicit 
restriction  that  completion  of  the  task  not  be  beyond  the 
capability  of  a  single  programmer  working  alone  for  a 
reasonaole  period  of  time.  Additionally,  a  large  set  of 
programming  aspects  were  specified;  they  are  discusseo  in 
Chapters  IV  and  V*  For  each  programming  aspect  there  were 
similar  questions  of  interest,  similar  research  hypotheses 
and  similar  experiments  conducted  in  parallel* 

The  schema  for  the  research  hypotheses  may  be  stated  as 
"in  the  context  of  a  one-person*do-able  software  development 
project,  there  <  is  not  1  is  >  a  difference  in  the 
<  location  I  dispersion  >  of  the  measurements  on  programming 
asoect  <  X  >  between  individuals  (AI),  ad  hoc  teams  (AT), 
and  disciplined  teams  <  DT )  .  ••  For  each  programming  aspect 
'X'  in  the  set  under  c ons i d e ra t i on ,  this  schema  generates 
two  pairs  of  non d i rec t i on  a l  research  hypotheses,  depending 
upon  the  selection  of  'is  not'  or  'is'  corresponding  to  the 
null  and  alternative  hypothesis,  and  the  selection  of 
'location'  or  'disDersion'  corresponding  to  the  type  of 
dif  fere  nee* 

Step  l :  ^odel 

The  choice  of  a  statistical  model  makes  explicit 
various  assumptions  regarding  the  experimental  design,  the 
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deoenaent  variables,  the  underlying  population 
distributions,  etc*  Because  the  study  involves  a 
ho m 09 e ne i t y-o f-p opu l a t i on s  problem  with  shift  and  spread 
alternatives,  the  multi-sample  model  used  here  requires  the 
following:  independent  populations,  independent  and  random 
sampling  within  each  population,  continuous  underlying 
distributions  for  each  population,  homo s c eda s t i c i t y  (equal 
variances)  of  underlying  d i s t r i b ut i ons ,  and  interval  scale 
of  measurement  [Conover  71,  pp •  65-673  for  each  programming 
asoect.  Although  ranoom  sampling  was  not  explicitly 
achieved  in  this  study  by  rigorous  sampling  procedures,  it 
was  nonetheless  assumed  on  the  basis  of  the  apparent 
reoresentativeness  of  the  subject  pool  and  the  lack  of 
oovious  reasons  to  doubt  otherwise.  Due  to  the  small  sample 
sizes,  the  unknown  shape  of  the  underlying  d  i  s t r i ou t i on s  , 
and  the  partially  exploratory  nature  of  the  study,  a 
nonpa rame t ric  statistical  model  was  used. 

whenever  statistics  is  emoloyed  to  "prove"  that  some 
systematic  effect  —  in  this  case,  a  difference  among  tne 
groups--e x  ist s ,  it  is  important  to  measure  the  risk  of 
error*  This  is  usually  done  by  reporting  a  significance 
level  a  [Conover  71,  p.  793,  which  represents  the 
probability  of  deciding  that  a  systematic  effect  exists  when 
in  fact  it  does  not.  In  the  model,  tne  hypothesis  testing 
for  each  programming  aspect  was  regarded  as  a  separate 
indepenaent  experiment.  Consequently,  the  significance 
level  is  controlled  and  reported  experimentwise  (i.e.,  per 
asoect).  While  the  assumption  of  independence  between  such 
experiments  is  not  entirely  supportable,  this  procedure  is 
valid  as  long  as  statistical  inferences  that  couple  two  or 
more  of  the  orogramming  aspects  are  avoided  or  properly 
qualified.  In  this  study,  statements  regarding 
in t e r re  la t ion sh i ps  among  aspects  are  made  only  within  the 
interpretations  in  Chapter  VIII. 
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Step  * :  y^gojhe ses 

The  research  hypotheses  must  be  translated  into 
statistically  tractable  form,  called  statistical  hypotheses* 
A  correspondence,  governed  oy  the  statistical  model,  exists 
between  applicat ion-orientea  notions  in  the  research 
hypotheses  (e.g*,  typical  performance  of  a  programming  team 
under  the  disciplined  methodology)  and  mathematical  notions 
in  the  statistical  hypotheses  (e*g*,  expected  value  of  a 
random  variable  defined  over  the  population  from  which  the 
disciplined  teams  are  a  representative  sanDle)*  Generally 
speaking,  only  certain  mathematical  statements  involving 
pairs  of  populations  are  statistically  tractable,  in  the 
sense  that  standard  statistical  procedures  are  applicable. 
Statements  that  are  not  directly  tractable  may  be  decomposed 
into  tractable  ( s ub ) c om po ne n t s  whose  results  are  properly 
reconoined  after  having  been  decided  individually* 

In  this  study,  the  research  hypotheses  are  concer  neo 
with  directional  differences  among  three  programming 
environments.  Since  the  corresponding  mathematical 
statements  are  not  directly  tractable,  they  were  decomposed 
into  the  set  of  seven  statistical  hypotheses  pairs  shown 
below*  As  a  shorthand  notation  for  longer  English 
sentences,  symbolic  "equations"  are  uses  to  express  tnese 
statistical  hypotneses*  The  -  symbol  denotes  negation* 

The  ♦  symbol  denotes  pooling*  The  =■  ,  t  ,  ana  < 
symbols  indicate  comparisons  on  the  basis  of  either  the 
location  or  dispersion  of  the  dependent  variables. 

The  hypotheses  pair 

null :  iii££Qillye: 

AI  =  AT  =  DT  -(M  s  AT  *  OT) 
addresses  the  existence  of  an  overall  difference  among  the 
groups*  However,  due  to  the  weak  nond i re c t i ona l 
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alternative,  it  cannot  indicate  which  groups  are  different 
or  in  what  direction  a  difference  ties.  Standard 
statistical  practice  prescr ices  that  a  successful  test  for 
overall  difference  among  three  or  more  groups  be  followed  by 
tests  for  pairwise  differences.  The  hypotheses  pairs 


oyil  • 

ail££Datiy£: 

A I  =  AT 

A I  t  AT 

or  A  I 

<  AT 

or  AT 

<  AI 

AT  =  DT 

AT  *  DT 

or  AT 

<  DT 

or  D  T 

<  AT 

A  I  =  D  T 

AI  *  DT 

or  A  I 

<  DT 

or  DT 

<  AI 

aodress  the  existence  and  direction  of 

pairwise 

differences 

bet  ween  groups. 

The  results  of 

these 

pa i rw i se 

comparisions 

were  used  to  refine  the  overall  comparison.  Data  collected 
*or  a  set  of  experiments  may  often  oe  legitimately  reused  to 
"simulate”  other  closely  related  experiments,  by  combining 
certain  samples  together  and  ignoring  the  original 
di s t i nc t i on( s )  between  them.  It  is  meaningful,  in  the 
context  of  this  study's  experimental  design,  to  compare  any 
two  groups  pooled  against  the  third  since  (1)  AI  and  AT  are 
both  undisciplined,  while  DT  is  disciplined;  (2)  AT  and  DT 
are  both  teams,  and  AI  is  individuals;  and  (3)  uncer  the 
assumption  that  disciplined  teams  behave  like  individuals-- 
which  is  part  of  the  study's  basic  premise,  DT  and  AI  can  be 
pooled  and  compared  with  AT  acting  as  a  control  group.  The 
hyootneses  pairs 

nu :  iil£EQ2llve: 

A  I ♦ A  T  =  DT  A  I ♦ A  T  *  DT  or  A  I ♦ A  T  <  DT  or  DT  <  A  I ♦ AT 

A  T ♦ D  T  =  AI  AT+DT  t  AI  or  AT  +  DT  <  AI  or  AI  <  A  T ♦ D  T 

A  I ♦ D  T  =  AT  A 1 ♦ DT  t  AT  or  A I ♦ D  T  <  AT  or  AT  <  AI+DT 

address  the  existence  and  direction  of  such  pooled 
differences.  The  results  of  these  pooled  comparisons  were 
useo  to  corrooate  the  overall  and  pairwise  comparisons. 

Thus,  for  each  programming  aspect,  the  research 
hypotheses  pair  corresponds  to  seven  different  pairs  (null 
and  alternative)  of  statistical  hypotheses.  The  results  of 
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testing  each  set  of  sevm  hypotheses  nust  be  abstracted  ana 
or'j  J'»  Weil  into  one  statistical  conclusion  us  ini  the  first 
research  framework  discussed  in  the  next  steo# 

?tep  5:  F  r  a  m  g  w  or  ^  £ 

The  research  frameworks  provide  the  necessary 
or  3  a n i z a t i ona l  basis  for  abstracting  ana  conceptualizing  the 
massive  volume  of  statistical  hypotheses  (and  statistical 
results  that  follow)  into  a  smaller  and  more  i n t e  l  l ec t ua  l  l  y 
manageaole  set  of  conclusions*  Three  separate  research 
frameworks  have  been  chosen:  (1)  the  framework  of  doss i tie 
overall  comparison  outcomes  for  a  given  programming  aspect* 
(2)  the  framework  of  oependencies  and  intuitive 
re l a t i ons h  i  ps  among  the  various  programming  aspects 
considered,  and  (3)  the  framework  of  basic  suppositions 
regarding  expected  effects  of  the  experimental  treatments  on 
the  comparison  outcomes  for  the  entire  set  of  programming 
aspects*  The  first  framework  is  employed  in  the  statistical 
conclusions  step  because  it  can  be  applied  in  a 
statistically  tractaole  manner,  while  the  remaining  two 
frameworks  are  reserved  for  employment  in  the  research 
interpretations  step  since  they  are  not  statistically 
tractable  and  involve  subjective  judgement* 

Since  a  finite  set  of  three  different  programming 
environments  (AI,  AT,  and  DT)  are  being  compared,  there 
exists  the  following  finite  set  of  thirteen  possible  overall 
comparison  outcomes  for  each  aspect  considered: 
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<  A  I  J 


AI  *  AT  =  OT 


} 


AT  t  DT  =  AI 


} 


OT  t  AI  =  AT 


AI  <  AT  <  OT 
AI  <  OT  <  AT 
AT  <  OT  <  AI 
AT  <  AI  <  OT 
DT  <  AI  <  AT 
or  <  AT  <  AI 


>  A  I  t  AT  t  OT 


AI  =  AT  =  OT 

A  I  <  AT  • 

A  T  =  OT 

AT  <  OT  =  AI 

OT  =  AI  <  AT 

QT  <  AI  =  A 

AI  -  AT  <  OT 

There  is  a  hierarchical  lattice  of  increasing  separation  ana 
di  r e c t i ona l i t y  among  these  possible  overall  comparison 
outcomes  as  shown  in  Figure  3.  These  thirteen  possible 
overall  comparison  outcones  comprise  the  first  research 
framework  and  may  be  viewed  as  providing  a  complete  ’'answer 
space"  for  the  questions  of  interest*  It  is  clear  that  any 
consistent  set  of  two-way  comparisons  (such  as  represented 
in  tne  statistical  hypotheses  or  statistical  results)  may  be 
associated  with  a  unique  one  of  these  three-way  comparisons* 
This  framework  is  the  basis  for  organizing  and  condensing 
the  seven  statistical  results  into  one  statistical 
conclusion  for  each  programming  aspect  considered. 


Since  a  large  set  of  interrelated  programming  aspects 
are  being  examined,  it  would  be  desirable  to  summarize  many 
of  the  "per  aspect"  hypotheses  and  results  into  statements 
which  refer  to  several  aspects  simultaneously.  For  example, 
average  number  of  statements  per  segment  is  ore  asoect 
directly  dependent  on  two  other  aspects:  number  of  segments 
and  number  of  statements.  Other  i n t e r re l a t i on sh i ps  are  more 
intuitive,  less  tractable,  or  only  suspected,  for  example, 
the  "trade-off*  between  global  variables  and  formal 
oarameters*  A  simple  classification  of  the  programming 
asoects  into  groups  of  intuitively  related  aspects  at  least 
provides  a  framework  for  jointly  interpreting  the 
corresponding  statistical  conclusions  in  light  of  the 
underlying  issues  by  which  the  aspects  themselves  are 
related.  The  programming  aspects  considered  in  this  study 
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were  classified  according  to  a  particular  set  of  nine 
higher-level  programming  issues  (such  as  data  variable 
or 3  a n i z a t i on t  for  example);  details  are  given  in  Chapter 
VIII.  This  second  research  framework  is  the  oasis  for 
aostractinc  and  interpreting  what  the  study's  finding* 
indicate  about  these  higher-level  programming  issues*  as 
well  as  explicitly  mentioning  several  individual 
relationships  among  the  orog ramming  aspects  and  their 
conclusions. 

Since  the  design  of  the  experiments*  the  choice  of 
treatments*  etc.*  were  at  least  partially  motivated  by 
certain  general  beliefs  regarding  software  development 
(e.g.*  "disciplined  methodology  reduces  software  development 
costs")*  it  should  be  possible  to  explicitly  state  what 
comparison  outcomes  among  the  experimental  treatments  were 
exoected  a  priori  for  which  programming  asoects#  A  list  of 
preplanned  expectations  (so-called  "basic  suppositions")  for 
the  outcomes  of  each  aspect's  experiment  would  provide  a 
framework  for  evaluating  ho*  well  the  experimental  findings 
as  a  whole  support  the  underlying  general  beliefs  ( oy 
comparing  the  actual  outcomes  with  the  basic  suppositions 
across  all  the  programming  aspects).  Such  a  list  of  basic 
Suppositions  was  conceived  prior  to  conducting  the 
experiments*  and  it  constitutes  the  third  research 
framework;  details  are  given  in  Chapter  VIII.  This 
framework  is  the  basis  for  interpreting  the  study's  finoings 
as  evidence  in  favor  of  the  basic  suppositions  ana  general 
be l i e f  s  * 

Step  6:  i£E£r ime njai  Design 

The  experimental  design  is  the  plan  according  to  which 
the  experiment  is  actually  executed.  It  is  based  upon  the 
statistical  model  and  deals  with  p ra c t i ca l  ’  i ssue s  such  as 
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experimental  units,  treatments,  local  control,  etc.  The 
experimental  design  employed  for  this  study  has  been 
discussed  in  considerable  detail  in  Chapter  III. 

Step  7:  £o 1 1 ££ t £ d  0^t£ 

The  pertinent  data  to  carry  out  the  experimental  design 
was  collected  and  orocessed  to  yield  the  information  to 
which  the  statistical  test  procedures  were  applied.  Some 
details  of  these  activities  have  been  given  in  Chapter  III. 

Ct?p  os  Sta£isxi£ai  Tgsi  El2££<Jyl£S 

A  statistical  test  oroceaure  is  a  decision  mechanism, 
founded  upon  general  principles  of  mathematical  probability 
and  combi nator i c s  and  upon  a  specific  statistical  model 
(i.e.,  requiring  certain  assumptions),  which  is  used  to 
convert  the  statistical  hypotheses  together  with  the 
collected  data  into  the  statistical  results.  As  dictated  by 
the  statistical  model,  the  statistical  tests  used  in  the 
study  were  nonpa rame t r i c  tests  of  homogeneity  of  populations 
against  shift  alternatives  for  small  samples.  Nonpar ame t r  i  c 
tests  are  slightly  more  conservative  (in  rejecting  the  null 
hypothesis)  than  their  parametric  counterparts; 
nonpa rame t r i c  tests  generally  use  the  ordinal  ranks 
associated  with  a  linear  ordering  of  a  set  of  scores,  rather 
than  the  scores  themselves,  in  their  computational  formulas. 
In  particular,  the  standard  <ru$ ka  l-Wa l l i s  H-test  CSiegel 
56,  pp.  784-7933  and  Mann-whitney  U-test  CSiegel  56,  pp. 
716-1273  were  employed  in  the  statistical  results  step. 
Ryan's  Method  of  Adjusted  Significance  Levels  CiCirk  66,  p. 
^7f  PP.  495-497],  a  standard  procedure  for  controlling  the 
exo e r i men t w i se  significance  level  when  several  tests  are 
performed  on  the  same  scores  as  one  experiment,  was  also 
employed  in  the  statistical  conclusions  step. 
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The  Kruskal-waths  test  is  used  in  three-sample 
situations  to  test  an  x  =  Y  =  Z  nutl  hypothesis;  its  test 
statistic  is  comouted  as 

h  =  1  ?*[  (Rx 2/nx ) ♦ ( Ry 2/ny) ♦ ( Rz 2/n/ ) }/ Cn*  (n*1  )  ]  -  5*(n+1> 

where  Rx  f  Ry  9  and  R2  are  the  respective  sums  of  the 
ranks  for  scores  from  the  x»  Yt  and  Z  samples;  n  equals 
nx+ny+nz  where  nx  9  ny  9  ana  nz  are  the  respective 
sample  sizes*  The  y\a  n  n -W  h  i  t  ne  y  test  is  used  in  two-sample 
situations  to  test  an  X  =  Y  null  hypothesis;  its  test 
statistic  is  comouted  a- 

d  =  minC  nx  *ny  ♦  nx*  (nx+1 ) /2  -  Rx  * 
ny  *nx  ♦  ny* (ny*1 ) / ?  -  Ry  J 

wn?re  Rx  ,  Ry  ,  nx  ,  and  ny  are  defined  as  before. 

For  every  statistical  test,  there  exists  a  one-to-one 
mapping,  usually  given  in  statistical  tables,  between  the 
test  statistic--a  value  completely  determined  by  the  sample 
data  scores — and  the  critical  level.  The  critical  level  5 
[Conover  71,  p.  SI]  is  defined  as  the  minimum  significance 
level  at  which  the  statistical  test  procedure  would  allow 
the  null  hypothesis  to  be  rejected  (in  favor  of  the 
alternative)  for  the  given  sample  data.  Thus  critical  level 
represents  a  concise  standarized  way  to  state  the  full 
result  of  any  statistical  test  procedure.  Two-tailed 
rejection  regions  are  apolied  for  tests  involving 
non d i re c t i ona l  alternative  hypotheses,  and  one-tailed 
rejection  regions  are  applied  for  tests  involving 
directional  alternative  hypotheses,  so  that  the  stated 
critical  level  always  pertains  directly  to  the  stated 
alternative  hypothesis.  A  decision  to  reject  the  null 
hypothesis  and  accept  the  alternative  is  mandated  if  the 
critical  level  is  low  enough  to  be  tolerated;  otherwise  a 
decision  to  retain  the  null  hypothesis  is  mast. 

The  Ryan's  procedure  is  used  in  situations  involving 
multiple  pairwise  comparisons,  in  order  to  properly  account 
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for  the  fact  that  each  pairwise  test  is  made  in  conjunction 
with  the  others*  using  the  same  sample  data*  The  individual 
critical  levels  a  obtained  for  each  pairwise  test  in 
isolation  are  adjusted  to  proper  e xp e r i me n t w  i  s e  critical 
levels  S'  via  the  formula 
a'  =  C(r+1)*k  /23  *  5 

where  k  is  the  total  number  of  samples;  and  r  is  the 
nunoer  of  (other)  samples  whose  rank  means  fall  between  the 
rank  means  of  the  particular  pair  of  samples  being  compared. 
a  simple  "minima*"  step--taking  the  maximum  of  the  several 
adjusted  pairwise  critical  levels*  plus  the  overall 
comparison  critical  level?  which  are  all  minimum 
significance  levels — completes  the  procedure*  yielding  a 
single  critical  level  associated  jointly  with  the  overall 
and  pairwise  comparisons* 

These  tests  and  procedures  apply  s t r a i g ht f o r wa r d l y  when 
differences  in  location  are  considered*  A  slight 
modification  makes  them  applicable  for  differences  in 
dispersion:  prior  to  ranking*  each  score  value  is  simply 
reolaceo  by  its  absolute  deviation  from  the  corresponoing 
within-group  sample  median  C'iemenyi  et  al*  7  7f  pp.  266-2703. 
It  should  be  noted  that  this  modification  results  in  only  an 
approximate  method  for  solving  a  tough  statistical  problem* 
namely*  testing  whether  one  population  is  more  variable  than 
another  ENemenyi  et  al.  77*  op*  279-2333*  The  modification 
is  not  st a t i s t i c a  1 1 y  valid  in  the  general  case  (it  weakens 
the  power  of  the  test  procedures  and  can  yield  inaccurate 
critical  levels  when  testing  for  dispersion  differences)* 
but  every  other  available  method  also  has  serious 
limitations*  This  method  has  been  shown  (empirically  via 
Monte  Carlo  techniques)  to  possess  reasonable  accuracy,  as 
long  as  the  underlying  distributions  are  fairly  symmetrical* 
and  is  readily  adapted  to  the  study's  three-way  comparison 
situation. 
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rtep  9 :  Zssyiis 


*  statistical  result  is  essentially  a  decision  reached 
by  applying  a  statistical  test  procedure  to  the  set  ot 
collected  and  refined  aata,  regarding  which  one  of  the 
corresponding  pair  (null*  alternative)  of  statistical 
hypotheses  is  indeed  supoor  ted  oy  that  aata*  For  each  pair 
of  statistical  hypotheses,  there  is  one  statistical  result 
consisting  of  four  components:  (1)  the  null  hypothesis 
itself;  (  2  )  the  alternative  hypothesis  itself;  (3)  the 
critical  level,  stated  as  a  probability  value  between  q  and 
i;  a^a  (4)  a  decision  either  tQ  retain  the  null  hypothesis 
or  to  reject  it  in  favor  of  (i*e*t  accept)  the  alternative 
hypothesis* 


ty  convention,  the  null  hypothesis  is  that  no 
systematic  difference  apoears  to  exist*  and  the  alternative 
hypothesis  puroorts  that  some  systematic  difference  exists* 
The  critical  level  is  associated  with  erroneously  acceoting 
the  alternative  hypothesis  (i.e.,  claiming  a  systematic 
difference  when  none  in  fact  exits)*  The  decision  to  retain 
or  reject  is  reached  on  the  basis  of  some  tolerable  level  of 
significance,  with  *hich  the  critical  level  is  compared  to 
see  if  it  is  low  enough.  In  cases  where  a  null  hypothesis 
is  rejected,  the  appropriate  directional  alternative 
hypothesis  (if  any)  is  used  to  indicate  the  direction  of  the 
systematic  difference,  as  determined  by  direct  observation 
from  the  sample  medians  in  conjunction  *ith  a  one-tailed 
test. 


Conventional  practice  is  to  fix  an  arbitrary 
significance  level  (e.g.,  0*05  or  C,01)  in  advance,  to  be 
usea  as  the  tolerable  level;  critical  levels  then  serve  only 
as  steoping-ston^s  toward  reaching  aec i si ons  and  are  not 
reoorted*  For  this  partially  exploratory  study,  it  was 
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more  appropriate  to  fix  a  tolerable  level  only  for 
the  rurpov*  of  a  screen  109  decision  (simply  to  purqe  those 
results  with  int  oleraol y  hi^b  critical  levels)  ano  to 
explicitly  attach  any  surviving  critical  level  to  each 
statistical  result*  This  unconventional  practice  yields 
statistical  results  in  a  more  meaningful  and  flexible  form, 
since  the  significance  or  error  risk  of  each  result  may  be 
assessed  i nd i v i d ua l l y ,  and  results  at  other  more  stringent 
sisnificance  levels  may  oe  easily  determined*  Furthermore, 
the  necessary  information  is  retainea  for  properly 
recombining  multiple  related  results  on  an  e xpe r i me nt # i s e 
basis  in  the  statistical  conclusions  step# 

The  tolerable  level  of  significance  used  throughout 
this  study  to  sceen  critical  levels  was  fixed  at  under  0*20* 
Although  fairly  high  for  a  confirmatory  study,  it  is 
reasonable  for  a  partially  exploratory  study,  such  as  this 
one,  seeking  to  discover  even  slight  trends  in  the  data*  A 
critical  level  of  0*20  means  that  the  odds  of  obtaining  test 
scores  exhibiting  the  same  degree  of  difference,  due  to 
random  chance  fluctuations  alone,  are  one  in  five* 

As  an  example,  the  seven  statistical  results  for 
location  comoarisons  on  the  programming  aspect  STATEMENT 
T  Y 3  E  COUNTSXIF  are  shown  below.  (N.9.  The  asterisks  will  be 
explained  in  Steo  10.) 


null 

alternative 

critical 

(  sc  reen ing ) 

izaalbssis 

hxssibssls 

AI  =  AT  =  or 

-(AI  -  A  T  =  DT) 

.063 

reject 

A  I  =  AT 

AI  <  AT 

.046 

reject 

A  I  =  DT 

AI  *  DT 

>.999 

retain 

AT  =  DT 

DT  <  AT 

.011 

reject 

A  I ♦ A  T  =  DT 

DT  <  AI *AT 

.CSS 

reject  * 

A  I  ♦  D  T  =  AT 

A  I*  DT  <  AT 

.009 

reject 

A  T*D  T  =  A  I 

AT*DT  *  AI 

.335 

re t  a i n  * 

^oserve  that  the  stated  decisions  simply  reflect  the 
application  of  the  0*20  tolerable  level  to  the  stated 
critical  levels*  Results  under  more  stringent  levels  of 
significance  can  be  easily  determined  by  simply  applying  a 
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lo-er  tolerable  level  to  form  the  decisions;  e  .g  .  »  at  the 
^.35  significance  level*  only  the  AI  <  AT,  DT  <  ATf  and 
Al  +  pT  <  AT  alternative  hypotheses  would  be  accepted;  only 
th?  A I ♦ dt  <  AT  hypothesis  would  be  accepted  at  the  Q.  Jl 
level  • 

r  t  ?  p  1  n  :  statistical  Conclusions 

The  volume  of  statistical  results  are  organized  and 
condensed  into  statistical  conclusions  according  to  tne 
prearranged  research  f r am e w o rk ( s ) •  A  statistical  conclusion 
is  an  aostraction  of  several  statistical  results*  but  it 
retains  the  same  statistical  character,  having  been  derived 
via  statistically  tractable  methods  and  possessing  an 
associated  critical  level. 

The  first  research  framework  mentioned  above  was 
employed  to  reduce  the  seven  statistical  results  (with  seven 
individual  critical  levels)  for  each  programming  aspect  to  a 
single  statistical  conclusion  (with  one  overall  critical 
level)  for  that  aspect.  The  statement  portion  of  a 
statistical  conclusion  is  simply  one  of  the  thirteen 
possible  overall  comparison  outcomes.  Each  overall 
comparison  outcome  is  associated  with  a  particular  set  of 


statistical 

results 

w  h  o  se 

outcomes 

suppor  t 

the  overall 

comparison  outcome 

in  a  natural  way. 
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versus  AI)  are  in  a  sense  orthogonal  to  the  overall 
comparison  outcome  (DT  =  AI  <  AT),  their  results  are 
considered  irrelevant  to  this  conclusion*  The  chart  in 
Figure  A  shows  exactly  which  results  are  associated  with 
^3Ch  conclusion:  the  relevant  comparisons,  the  null 
hypotneses  to  be  retained,  and  the  alternative  hypotheses  to 
be  accepted.  The  other  portion  of  a  statistical  conclusion 
is  the  critical  level  associated  with  erroneously  accepting 
the  statement  portion*  It  is  computed  from  the  individual 
critical  levels  of  certain  germane  results* 

A  simple  algorithm  based  on  the  chart  in  Figure  4  was 
used  to  generate  the  statistical  conclusions  (and  compute 
the  overall  critical  level)  automa t i ca  l  ly  from  the 
statistical  results*  For  each  programming  aspect,  the 
algorithm  compared  the  set  of  actual  results  obtained  for 
tne  seven  statistical  hyootheses  pairs  to  the  set  of  results 
associated  (in  the  chart)  with  each  conclusion,  searcning 
for  a  match*  Ryan's  procedure  was  used  to  properly  combine 
the  individual  critical  levels  for  the  overall  result  and 
the  relevant  pairwise  results,  by  adjusting  them  via  the 
formula  and  then  taking  their  maximum.  The  critical  levels 
*or  the  relevant  pooled  results  were  then  factored,  by  a 
simple  formula  based  on  the  multiplicative  rule  for  tne 
joint  probability  of  independent  events* 

Continuing  the  example  started  in  Steo  9 ,  the 
statistical  results  shown  there  for  location  comparisons  on 
the  STATEMENT  TYPE  COUNTSVIF  aspect  are  reduced  to  the 
statistical  conclusion  OT  =  AI  <  AT  with  *078  critical  level 
overall*  The  five  results  not  marked  with  an  asteris*  in 
Step  9  match  the  five  results  associated  above  with  the 
P  T  -  A I  <  AT  outcome.  (Note  that  the  other  two  marked 
results  represent  comparisons  that  are  irrelevant  to  this 
conclusion.)  The  #046  and  *011  critical  levels  for  tne  two 
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pairwise  differences  are  adjusted  to  *07C  and  .033  , 
respec  t l/»  and  the  maximum  among  those  ao  just  ea  values 
and  the  .363  overall  difference  critical  level  is  .Q7u.  The 
relevant  pooled  comparison  critical  level  of  .008  is 
factored  in  by  taking  the  complement  of  the  products  of  the 
complements: 

1  -  [(1  -  .Q69)*(1  -  .038)]  =  .078 

Thus,  the  statistical  conclusions  are  in  one-to-one 
correspondence  with  the  research  hypotheses  and  provide 
concise  answers  on  a  "per  aspect'*  basis  to  the  questions  of 
interest.  Further  details  and  complete  listing  of  the 
statistical  conclusions  for  this  study  are  presented  in 
Chapter  v I  I • 

Step  11:  Re^ear^h  In£er£i:£t2ii20S 

The  final  step  in  the  method  is  to  interpret  the 
statistical  conclusions  in  view  of  any  remaining  research 
f r a  me  wo rk ( s ) ,  the  researcher's  intuitive  understanding  ,  and 
the  work  of  other  researchers.  These  research 
interpretations  provide  the  opportunity  to  augment  the 
objective  findings  of  the  study  with  the  researcher's  own 
professional  judgment  and  insight.  The  second  and  third 
research  frameworks  mentioned  above— name  ly ,  the  intuitive 
re l a t i ons h  i  ps  among  the  various  programming  aspects  and  the 
basic  suppositions  governing  their  expected  ou t c ome s- -w e r e 
cons ioe  red  important  tor  this  purpose.  However  these 
particular  research  frameworks  can  only  be  utilized  for  the 
research  interpretations!  since  they  are  not  amenable  to 
rigorous  manipulation.  Nonetheless,  within  these  frameworks 
based  uoon  intuitions  about  the  software  metrics  and 
programming  environments  unaer  consideration,  the  study 
bears  some  of  its  most  interesting  results  and  implications. 
Complete  details  ana  discussion  of  the  research 
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interpretations  of  this  study  appear  in  Chapter  VIII. 
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VII.  R£SULIS 

This  chapter  reports  the  objective  results  of  the 
study*  nately  f  the  statistical  conclusions  for  each 
programming  aspect  considered.  In  keeping  with  the 
empirical  and  statistical  character  of  these  conclusions* 
the  tone  of  discussion  here  is  purposely  somewhat 
disinterested  and  analytical.  All  interpretive  discussion 
is  deferred  to  Chapter  VIII?  in  accordance  with  the 
investigative  methodology. 

Each  statistical  conclusion  is  expressed  in  the  concise 
form  of  a  three-way  comparison  outcome  "e  qua  t  i  on  • 11  It 
states  any  observed  differences*  ana  the  directions  thereof* 
among  the  programming  environments  represented  by  the  three 
groups  examined  in  the  study:  ad  hoc  individuals  (AI)*  ad 
hoc  teams  (AT)*  and  disciplined  teams  (DT).  The  equality 
A I  =  AT  -  DT  expresses  the  null  outcome  that  there  is  no 
systematic  difference  among  the  groups.  An  inequality* 
e.g.t  A  I  <  at  —  DT  or  DT  <  AI  <  AT*  expresses  a  non-null  (or 
alternative)  outcome  that  there  are  certain  systematic 
dif ference(s)  among  the  groups  in  stateo  d i r ec t i on ( s ) .  A 
critical  level  value  is  also  associated  with  each  non-null 
(or  alternative)  outcome*  indicating  its  individual 
reliability.  This  value  is  the  Drooability  of  having 
erroneously  rejected  the  null  conclusion  in  favor  of  the 
alternative;  it  also  provides  a  relative  index  of  how 
pronounced  the  differences  were  in  the  sample  data. 

The  remainder  of  this  chapter  consists  of  (a) 
presenting  the  full  set  of  conclusions*  (b)  evaluating  their 
impact  as  a  whole*  (c)  exposing  a  "relaxed  differentiation" 
view  of  the  conclusions*  (d)  exposing  a  "directionless"  view 
of  the  conclusions*  and  (e)  individually  highlighting  a  few 
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of  the  more  noteworthy  conclusions* 

£  e  Qt  £t 220 

The  complete  set  of  statistical  conclusions  for  Doth 
location  and  dispersion  comparisons  appears  in  Table  2 
arranged  Oy  Dro^  ramming  aspect*  Instances  of  non-null  (or 
alternative)  con c l us i ons- - t hose  indicating  some  distinction 
among  the  groups  on  the  basis  of  a  measured  programming 
asoect --are  listed  by  outcome  in  Taoles  4.1  (for  location 
comparisons)  and  4*2  (for  dispersion  comparisons). 

Examination  of  Table  2  immediately  demonstrates  that  a 
large  number  of  the  programming  aspects  considered  in  this 
study,  especially  product  aspects,  failed  to  show  any 
distinction  between  the  groups*  This  low  MyieldM  is  not 
surprising,  especially  among  product  aspects,  and  may  be 
attributed  to  the  partially  exploratory  nature  of  the  study, 
the  small  sample  sizes,  and  the  general  coarseness  of  many 
of  the  aspects  considered*  The  issue  of  these  null  outcome 
occurrences  and  their  significance  is  treated  more 
thoroughly  in  the  next  subsection.  Impact  Evaluation. 

It  is  worth  noting,  however,  that  several  of  the  null 
conclusions  may  indicate  c h a ra c t e r i s t i c s  inherent  to  the 
application  itself.  As  one  example,  the  basic  symbol -t able  / 
scanner/parser/code-generator  nature  of  a  compiler  strongly 
influences  the  way  the  system  is  modularized  and  thus 
practically  determines  the  number  of  modules  in  the  final 
product  (give  or  take  some  occasional  slight  variation  due 
to  other  design  decisions). 

Imo act  £v a^ua t 22Q 

The  collective  impact  of  these  statistical  conclusions 
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m ay  be  objectively  evaluated  according  to  the  following 
statistical  principle  [Tukey  69t  pp.  54-853*  Whenever  a 
series  of  statistical  tests  (or  experiments)  are  madet  all 
at  a  fixed  level  o*  significance  (for  example,  0*10),  a 
corresponding  percentage  (in  example,  10*0  of  the  tests 

are  expected  a  priori  to  reject  the  null  hypothesis  in  the 
complete  absence  of  any  true  effect  (i.e.,  due  to  chance 
alone)*  This  expected  rejection  percentage  provides  u 
comparative  index  of  the  true  impact  of  the  test  results  as 
a  whole  (in  the  example,  a  2 5X  actual  rejection  percentage 
wojld  indicate  that  a  truely  significant  effect,  other  than 
chance  alone,  was  operative)* 

The  point  here  may  be  illustrated  in  terms  of  simple 
coin-tossing  experiments.  The  nature  of  statistics  itself 
dictates  that,  out  of  a  series  of  100  separate  statistical 
tests  of  a  hypotheticaily  fair  coin  at  the  0.05  significance 
level,  roughly  5  of  those  tests  would  nonetheless  indicate 
that  the  coin  was  biased;  if  only  6  out  of  130  tests  of  a 
real  coin  indicate  Dias  at  the  0.05  level,  those  six  results 
have  very  little  impact  since  the  coin  is  behaving  rather 
unonseoly  over  the  full  set  of  tests. 

This  same  M  "»u  1 1  i  p  l  i  c  i  t  y prtnciole  applies  to  the 
statistical  conclusions  of  the  study,  since  they  represent 
the  outcomes  of  a  series  of  separate  tests  and  were  assumed 
in  the  statistical  model  to  be  separate  experiments.  It  is 
appropriate  to  evaluate  the  location  ana  dispersion  results 
seoarately,  since  they  reflect  two  separate  issues 
(expectency  and  p  red  i  c  t  ab  i  l  i  t  y  )  of  software  development 
behavior.  It  is  also  appropriate  to  evaluate  the  process 
and  product  results  separately.  Finally,  it  is  only  fair  to 
evaluate  the  confirmatory  aspects  as  a  distinct  subset  of 
all  aspects  examined,  since  they  alone  had  been  honestly 
considered  prior  to  collecting  and  analyzing  the  data. 
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vetails  of  this  imoact  evaluation  for  the  study's* 
oojective  result  s,  oroken  down  into  the  appropriate 
cate7ories  identified  above  »  are  presented  in  the  following 
taste*  (This  table  is  an  excerpt  from  Table  3,  which 
orovides  an  extensive  imoact  eva  luat ion»  broken  down 
h i e r a r c h i c a l l y  according  to  all  of  the  various  dichotomies 
identified  for  the  programming  aspects.)  The  evaluation  was 
oerfomed  at  the  a  =  C.20  significance  level  used  for 
screening  purposes,  hence  the  expected  rejection  percentage 
for  any  category  was  2  Q  X  •  For  each  category  of  aspects,  the 
taole  gives  the  numoer  of  ( non  re dunaan t )  programming 
aspects,  the  expected  ( roundea  to  whole  numbers)  and  actual 
nunoers  of  rejections  (of  the  null  conclusion  in  favor  of  a 
directional  alternative)!  and  the  expected  and  actual 
rejection  percentages.  An  asterisk  marks  those  categories 
demonstrating  noticable  statistical  impact  (i.e.,  actual 
rejection  oercentage  well  above  expected  rejection 
percentage). 


category 


exp, 

0 

re  j . 


act 

n 
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e  xp , 
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r?j* 


location 

process 
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1  79 
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38 
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13 
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A 

1 

1  r> 
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0.0 

croduct 

1  79 

36  1 

1  41 

20.0 

22.9 
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29 

6  1 

1  9 

20.0 

31  .0 

confirmatory  only 

2  A 

7  1 

1  9 

20.0 

25.7 

a  :  numoer  of  aspects 

exp.  1  rej.  :  expected  number  of  rejections 
act.  *  rej.  :  actual  number  of  rejections 
exp.  rej.  X  ;  expected  rejection  percentage 
act.  rej.  X  :  actual  rejection  percentage 


The  taote  shows  that  the  location  results,  dealing  with 
the  expectency  of  software  development  behavior,  oo  have 
statistical  impact  in  several  subcategories.  Process 
aspects  have  morp  impact  than  product  aspects  on  the  whole, 


Table 


TABLE  3 


Statistical  £y  dl^u^t  i  20 


category 


I  location 


process 

rudimentary 
conf i rmatory 
exploratory 
e  labors t  i  ve 
conf i rma  tory 
exploratory 
conf i rma  t  o  r  y 
exploratory 


product 

rudimentary 
conf i rma  tory 
exploratory 
e  labors t  i ve 
con  f i rma  tory 
exploratory 
conf i rma  to  r  y 
exploratory 


rudiment  ary 
conf i rma tory 
exploratory 
e labors t i v  e 
conf i rma to  r  y 
exploratory 
confirmatory 
exploratory 

I  dispersion 


rudimentary  | 
confirmatory 
exploratory 
e  labors  t i ve 
con f i rma  tory 
exploratory 
confirmatory 
exploratory 


produc  t 

rudi ment  ary 
con  f i rma  t  ory 
exploratory 
elabora t  i ve 
conf i rma  tory 
exploratory 
conf i rmator y 
exploratory 


number  expect*  actual  expect*  actual  | 
of  num.  of  num.  of  reject,  reject*! 
aspects  reject*  reject*  percent  percent i 


rudimentary  124  i 

conf i rmatory  31 

exploratory  93 

elaborative  65 

conf irmatory  4 

exploratory  61 

confirmatory  35 

exploratory  154 
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93 
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65 

13 

13 

20.0 

20.0 
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1 

20.0 

25.0 

61 

12 

12 

20.0 

19.7 

35 
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9 

20.0 

25.7 

154 

31 

34 

20.0 

22.1 

CHAPTER  VII 


but  when  tempered  by  consideration  of  tne  distinction 
bet-een  confirmatory  and  exploratory  aspects,  the  study's 
location  results  Dear  strong  statistical  impact  for  both 
process  and  product*  They  are  better  explained  as  the 
consequence  of  some  true  effect  related  to  the  experimental 
treatments?  rather  than  as  a  random  phenomenon. 

It  is  also  clear  from  the  table  that  the  dispersion 
results*  dealing  with  the  o r ed i c t a d i l i t y  of  software 
development  behavior,  have  little  statistical  impact  in 
general.  This  is  due  primarily  to  the  a i minished  power  of 
statistical  procedures  usea  to  test  for  dispersion 
differences,  compounded  by  the  small  sample  sizes  involved 
and  tne  coarseness  of  many  of  the  programming  aspects 
themselves.  The  lack  of  strong  statistical  impact  in  this 
area  of  the  study  does  not  mean  that  the  dispersion  issue  is 
unimportant  or  undeserving  of  research  attention,  but  rather 
that  it  is  "a  tougher  nut  to  crack’1  than  the  location  issue. 
The  study's  dispersion  results  are  still  worth  pursuing* 
however,  as  possible  hints  of  where  differences  might  exist, 
provided  this  disclaimer  regarding  their  impact  is  heeded. 

A  Relaxed  22 1 f e r e ni2a 11 2Q  Y  i  £ w 

As  described  in  Chapter  VI*  the  research  framework  of 
possiole  three-way  comparison  outcomes  provided  the  basis 
for  converting  the  statistical  results  into  the  statistical 
conclusions.  This  framework  has  two  innerent  structural 
characteristics  that  may  oe  exploited  to  make  additional 
observations  regarding  the  statistical  conclusions.  These 
structural  chara cteri st ic s  ana  the  supplemental  views  of  the 
conclusions  that  they  affora  are  described  here  and  in  the 
next  subsect ion. 

The  first  structural  characteristic  is  tnat  each 


table  <« .  1  Non-Null  Conclusions,  for  Location  Comparisons,  arranged  by  outcome 
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completely  d i f f e ren t i a t ed  outcome  is  related  to  a  specific 
pair  of  partially  differentiated  outcomes,  as  shown  in  the 
lattice  of  Figure  3*1*  For  example,  AI  <  AT  <  DT,  a 
completely  differentiated  outcome,  naturally  weakens  to 
either  AI  <  AT  =  DT  or  AI  =  AT  <  DT,  two  partially 
differentiated  outcomes* 

tach  completely  differentiated  outcome  consists  of 
three  oairwise  differences  (AI  <  AT,  AT  <  DT,  AI  <  DT  in  the 
example),  while  each  partially  differentiated  outcome 
consists  of  only  two  pairwise  differences  plus  one  pairwise 
equality  (AI  <  DT,  AI  <  AT,  AT  =  DT  and  AI  <  DT,  AT  <  DT, 

AI  =  AT  in  the  example)*  The  "outer"  difference  of  the 
completely  differentiated  outcome  (AI  <  DT  in  the  example) 
is  common  to  Doth  partially  differentiated  outcomes,  while 
each  partially  d i f f eren ti at ed  outcome  focuses  attention  on 
one  of  the  two  H  inner’1  differences  (AI  <  AT  and  AT  <  dT  in 
the  example)  to  the  exclusion  of  the  other  “inner” 
difference  which  is  “relaxed"  to  an  equality*  Within  a 
statistical  environment  or  model  which  places  a  premium  on 
claiming  differences  instead  of  equalities,  a  partially 
differentiated  outcome  is  a  safer  statement,  containing  less 
error-prone  information  than  a  completely  differentiated 
outcome*  Since  these  outcomes  represent  statistical 
conclusions,  the  same  data  scores  which  support  a  completely 
di f fe rent i a  ted  outcome  at  a  certain  critical  level  also 
suooort  each  of  the  two  related  partially  differentiated 
outcomes  at  lower  critical  levels* 

Thus,  every  completely  di f f erent ia ted  conclusion  may 
also  oe  considered  as  two  (more  significant)  partially 
di f f e rent i a  ted  conclusions,  each  of  these  three  conclusions 
having  equal  ana  complete  statistical  legitimacy*  The 
’•outer”  difference  of  a  completely  di  f  f  erent  ia  ted  conclusion 
is*  of  course,  stronger  than  either  of  its  two  “inner" 


chapter  vii 


i 


differences;  but  the  strengths  of  the  two  "inner" 
differences  (relative  to  each  other)  will  vary  in  accoraance 
with  the  data  scores  ana  indeed  are  reflected  in  the 
significance  levels  of  the  two  corresponding  partially 
differentiated  conclusions  (relative  to  each  other).  Tables 
5.1  and  5.2  give  the  details  of  this  "relaxed 
^  v  d i f f e r e n t i a t i on"  analysis  for  each  of  the  completely 

f  di f f e rent i ated  conclusions  found  in  the  study,  ana  an 

English  paraphrase  appears  in  the  two  paragraphs  immediately 
below.  All  of  the  partially  d  i  f f e rent i a t ed  conclusions 
listed  in  these  tables  should  be  added  to  those  presented  in 
Taoles  2  and  4;  they  deserve  fult  consideration  in  any 
analysis  or  inte  rpretat ion  of  the  study's  findings. 

However*  in  the  case  that  one  of  a  partially  d  i  f f e r en t i a t e d 
\  pair  is  noticeably  stronger  than  the  other,  it  is  fair  to 

ik,  consider  only  the  stronger  one  for  the  puroose  of  analysis 

or  interpretation  dealing  primarily  with  partially 
differentiated  outcomes,  since  the  study  is  mainly  concerned 
!  with  the  most  pronounced  difference  afforded  by  each 

aspect's  data  scores. 

* 

*  On  location  comparisons,  four  programming  aspects 

[  yielded  completely  differentiated  conclusions.  They  are 

"relaxed"  to  partially  differentiated  conclusions  as 
f ol  lows  : 

l  1  •  From  D  T  <  A I  <AT  on  the  PROGRAM  CHANGES  aspect  »  the 

j;  DT  <  AI  =  AT  conclusion  dwarfs  the  DT  =  AI  <  AT 

|  conclusion  with  respect  to  level  of  s i g ni f i can ce • 

;  2.  The  DT  <  at  difference  is  more  pronounced  than  the 

AI  <  DT  difference  from  AI  <  DT  <  AT  on  the  LINES 
aspect* 

3.  AT  <  DT  <  AI  on  the  ( SE  G v£  NT , GLOBAL)  USAGE  PAIR  RELATIVE 
P E RC E NT  A GE \ ENTR V  asoect  is  more  app r oo r ia t e l y  "relaxed" 
to  the  AT  <  oT  =  AI  conclusion  than  to  the  AT  =  DT  <  AI 
conclusion. 
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4.  Tne  AT  <  DT  and  DT  <  AI  differences  from  AT  <  DT  <  AI  on 
the  ($EGMENTfGLOdAL)  USAGE  PAIR  RELATIVE  PERCENTAGES 
E N T R Y \MO D I F I E 0  aspect  are  equally  strong. 

On  dispersion  comparisons*  four  programming  aspects 
yielded  completely  differentiated  conclusions.  They  are 
“relaxed"  to  partially  di f f e rent ia t ed  conclusions  as 
follows: 

f.  The  DT  <  AI  difference  is  much  more  pronounceo  than  the 
AI  <  AT  difference  from  DT  <  AI  <  AT  on  the  MAXIMUM 
UNIQUE  COMPILATIONS  FOR  ANY  ONE  MODULE  aspect. 

?.  from  DT  <  AI  <  AT  on  the  STATEMENT  TYPE  CO UN T S \ R E T U R N 
aspect*  the  DT  =  AI  <  AT  conclusion  dwarfs  the 
DT  <  AI  =  AT  conclusion  with  respect  to  level  of 
significance. 

3.  AI  <  DT  <  AT  on  the  ( SEGMENT  fGLO0AL)  POSSIBLE  USAGE 

pairs  aspect  is  more  appropriately  "relaxed"  to  the 
AI  <  AT  =  DT  conclusion  than  to  the  DT  -  AI  <  AT 
cone l usi on. 

4.  The  AI  <  DT  difference  is  more  pronounced  than  the 

DT  <  AT  difference  from  AI  <  DT  <  AT  on  the 
(SEGMENT ,GL0BAL)  POSSIBLE  USAGE  P A  I R S \ N ON E N T R Y \ 
UNMODIFIED  aspect. 

A  2 1  L££il 2 Vi^e w 

The  second  structural  c ha rac t e r i s t i c  of  the  possible 
outcome  framework  is  that  the  outcomes  may  be  classified 
into  another  closely  related  set  of  directionless  outcomes* 
as  shown  in  the  lattice  of  Figure  3.2.  For  examole* 

AI  <  AT  =  DT  and  AT  =  DT  <  AIf  two  directional  partially 
differentiated  outcomes*  both  correspond  to  AI  #  AT  =  DT*  a 
nona i re c t i ona l  partially  differentiated  outcome.  All  six  of 
the  directional  completely  differentiated  outcomes 
correspond  to  the  single  nondi rec t iona l  completely 


CHAPTER  VII 


differentiated  outcome  AI  i  AT  A  DT. 

u y  emphasizing  just  the  existence  and  not  the  direction 
of  distinctions  between  the  treatment  groups  ,  these 
directionless  outcome  categories  focus  attention  on  the 
original  research  issue  of  discovering  which  observaole 
programmina  aspects  differentiate  among  the  three 
programming  environments*  In  particular,  there  are  three 
noooirectional  partially  differentiated  outcomes  (each  of 
the  form  Mone  group  different  from  the  other  two  whicn  are 
sni  larM)  ,  and  it  is  noteworthy  to  observe  just  what  set  of 
orogramming  aspects  supports  each  of  these  basic 
di s t i nc t i ons.  (Table  4  is  arranged  so  that  the  directional 
distinctions  listed  there  can  be  readily  coalesced  by  eye 
into  directionless  categories*)  It  is  revealing  to  note 
that  »  with  one  exception,  the  directionless  distinctions  on 
location  comparisons  segregate  cleanly  along  the  process- 
ver sus-produc t  dichotomy  line:  all  of  the  product 
distinctions  fall  into  the  AI  *  AT  =  DT  or  AT  *  DT  =■  AI 
categories,  while  the  process  distinctions  consistently  fall 
into  the  DT  t  AI  »  AT  category*  Interest ing  ly  enough,  the 
on?  exception  is  that  a  number  of  the  cyclomatic  complexity 
metric  variations. (which  are  product  aspects)  show  the 
DT  /  A I  =  AT  directionless  outcome  (which  otherwise 
characterues  only  process  aspect  distinctions)* 

Iq^ ttiatii iatis 

The  purpose  of  this  concluding  section  is  to  point  out 
what  seem  to  be  the  "top  ten"  (well,  eleven  and  nine)  most 
noteworthy  conclusions  from  among  the  study's  oojective 
results*  These  conclusions  are  interesting  individually, 
either  because  the  programming  aspect  merits  attention  or 
because  the  difference  in  its  expectency  or  predictability 
is  pronounced  (as  indicated  by  a  low  critical  siqnificance 
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level)  in  the  experimental  sample  data. 

Noteworthy  i2£2iiCQ  distinctions  are  mentioned  below. 

1  •  According  to  the  D  T  <  A  I  =  AT  outcome  on  the  COMPUTER 
JOB  STEPS  aspect*  the  Disciplined  teams  used  very 
noticeably  fewer  computer  job  steps  (i.e.*  module 
compilations,  program  executions?  and  miscellaneous  j oo 
steps)  than  either  the  ad  hoc  individuals  or  the  ad  hoc 
teams. 

2m  This  same  difference  was  apparent  in  the  total  number  of 
module  comp  i  lations,  the  number  of  unique  (i.e.,  not 
identical  to  a  previous  compilation)  mpdule 
compilations,  the  number  of  program  executions,  and  the 
number  of  essential  job  steps  (i.e.,  unique  module 
compilations  plus  program  executions)*  according  to  the 
OT  <  AI  =  AT  outcomes  on  the  COMPUTER  JOB  STEPSVMODULE 
COMPILATION,  COMPUTER  JOB  STEP$\MODULE  COMPILATIONS 
UNIQUE,  COMPUTER  J03  S T EP $ \ PR OG R A M  EXECUTION,  and 
COMPUTER  J  0  9  S T E P S \E $ S E NT  I A L  aspects,  respectively. 

3.  According  to  the  OT  <  AI  =  AT  outcome  on  the  PROGRAM 
CHANGES  aspect,  the  disciplined  teams  required  fewer 
textual  revisions  to  build  and  debug  the  software  than 
the  ad  hoc  individuals  and  the  ad  hoc  teams. 

A.  There  was  a  definite  trend  for  the  ad  hoc  individuals  to 


have  produced  fewer  total  symbolic  lines  (including 
co\ents?  compiler  directives*  statements* 


dec  l  a\at i on s *  etc.)  than  the  disciplined  teams  who 

V 

produced  fewer  than  the  ad  hoc  teams*  according  to  the 
AI  <  OT  <  AT  outcome  on  the  LINES  aspect. 

)•  According  to  the  AI  <  AT  -  DT  outcome  on  the  SEGMENTS 

aspect,  the  ad  hoc  individuals  organized  their  software 
into  noticeably  fewer  routines  (i.e.*  functions  or 
procedures)  than  either  the  ad  hoc  teams  or  the 
o i sc i pi i ned  teams . 

>•  The  ad  hoc  individuals  displayed  a  trend  toward  having  a 
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greater  number  of  executable  statements  per  routine 
than  did  either  the  ad  hoc  teams  or  the  disciplinea 
teams?  according  to  the  AT  =  DT  <  AI  outcome  On  the 
AVERAGE  STATEMENTS  °ER  SEGMENT  aspect* 

7.  According  to  the  D  T  -  A  I  <  AT  outcomes  on  the  STATEMENT 
TYPE  COUNTS  \  IF  and  STATEMENT  TYPE  P  E  *  C  E  \T  A  6  E  \  I  F 
aspects?  both  the  ad  hoc  individuals  and  the 
disciplined  teams  coded  noticeably  fewer  IP  statements 
than  the  ad  hoc  teams,  in  terms  of  both  total  number 
ana  percentage  of  total  statements. 

7.  According  to  the  D  T  =  A  I  <  AT  outcome  on  the  DECISIONS 

aspectt  both  the  ad  hoc  individuals  and  the  disciplinea 
teams  tended  to  code  fewer  decisions  (i.e*,  IF,  wHILE, 
or  CASE  statements)  than  the  aa  hoc  teams, 
doth  the  ad  hoc  teams  and  the  disciplined  teams  declared 
a  noticeably  larger  number  of  data  variables  (i.e.? 
scalars  or  arrays  of  scalars)  than  the  ad  hoc 
individuals,  according  to  the  AI  <  AT  =  DT  outcome  on 
the  DATA  VARIABLES  aspect. 

1j.  According  to  the  AT  =  DT  <  AI  outcome  on  the  DATA 

VARIABLE  SCOPE  P E R CE NT  AGE S \ N0NGL09 AL \L0 CAL  aspect?  the 
ad  hoc  individuals  had  a  larger  percentage  of  local 
variables  compared  to  the  total  number  of  declared  data 
variables  than  either  the  ad  hoc  teams  or  the 
u i sc i p 1 i ned  teams  • 

fl •  There  was  a  slight  trend  for  Doth  the  ad  hoc 

individuals  and  the  disciplined  teams  to  have  fewer 
potential  data  bindings  (i.e.,  possible  c om mun i c a t i on 
paths  oetween  segments  via  global  variables,  as  allowed 
by  the  sot t ware's  modularization)  than  the  aa  hoc 
teams?  according  to  the  DT  =  AI  <  AT  outcome  on  the 
(SEGMENT, GLOBAL? SEGMENT)  DATA  b I N D I N G S \ PG S S I BL E  aspect* 

Noteworthy  di^Qgr^ijn  distinctions  are  mentioned  below. 
There  was  a  noticeaole  difference  in  variability,  with 
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the  disciplined  teans  less  than  the  ad  hoc  individuals 
less  than  the  ad  hoc  teams*  in  the  maximum  number  of 
unique  compilations  for  any  one  module*  according  to 
the  D  T  <  A I  <  AT  outcome  on  the  MAXIMUM  UNIQUE 
COMPILATIONS  FOR  A  NY  ONE  M0  DU  L E  aspect. 

The  ad  hoc  individuals  exhibited  noticeaoly  greater 
variation  than  either  the  ad  hoc  teams  or  the 
disciplined  teams  in  the  number  of  miscellaneous  job 
steps  (i.e.*  auxiliary  compilations  or  executions  of 
something  other  than  the  final  software  project), 
according  to  the  AT  *  DT  <  AI  outcome  on  the  COMPUTER 
JOB  STEp$\MISCELLANEQUS  aspect. 

According  to  the  DT  =  AI  <  AT  outcome  on  the  AVERAGE 
SEGMENTS  °E^  MODULE  aspect*  the  a'  hoc  individuals  and 
the  disciplined  teams  both  exhibited  noticeably  less 
variation  in  the  average  number  of  routines  per  module 
than  the  ad  hoc  teams. 

According  to  the  DT  =  AI  <  AT  outcomes  on  the  STATEMENT 
TYPE  COUNTS NRETURN  and  STATEMENT  TYPE  PERCENTAGES\ 
RETURN  aspects*  the  ad  hoc  teams  showed  rather 
noticeably  greater  variability  in  the  number  (both  raw 
count  and  normalized  percentage)  of  RETURN  statements 
coded  than  both  the  disciplined  teams  and  the  ad  hoc 
i ndi v  idua Is. 

In  the  number  of  calls  to  o rog ramme r-de f ined  routines* 
the  ad  hoc  individuals  displayed  noticeably  greater 
variation  than  both  the  ad  hoc  teams  and  the 
disciplined  teams*  according  to  the  AT  =  DT  <  AI 
outcome  on  the  INVOC AT IONSXNONINTR INSI C  aspect. 

According  to  the  pT  <  AI  -  AT  outcome  on  the  DATA 
VARIABLES  SCOPE  P E RC E N T AG E S \G LO 9 AL \N ON E NT R Y \ MO D I F I E D 
aspect,  the  disciplined  teams  displayed  noticeably 
smaller  variation  than  either  the  ad  hoc  individuals  or 
the  ad  hoc  teams  in  the  percentage  of  commonplace 
(i.e.*  ordinary  scooe  and  modified  during  execution) 
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global  variables  compared  to  tne  total  number  of  data 
variables  declared. 

Tne  ad  not  individuals  displayed  noticeably  less 
variation  in  the  number  of  formal  parameters  passed  by 
reference  than  both  the  ad  hoc  teams  and  the 
disciplined  teams,  according  to  the  A I  <  A  T  =■  D  T 
outcome  on  the  DATA  VARIABLE  SCOPE  COUNTS  \NONGLDhAL\ 
PARAMETERXREFERENCE  aspect. 

Q.  According  to  the  AI  <  D  T  <  AT  outcome  on  the 

(SEGMENT , GLOBAL)  POSSIBLE  USAGE  PAIRS  aspect,  there  was 
a  noticeable  difference  in  variability,  with  the  ad  hoc 
individuals  less  than  the  disciplined  teams  less  than 
the  ad  hoc  teams,  for  the  total  number  of  possible 
segment-global  usage  pairs  (i*e.t  occurrences  of  the 
situation  where  a  global  variable  coula  be  modified  or 
accessed  by  a  segment). 

O.  According  to  the  DT  =  AI  <  AT  outcome  on  the 

(SEGMENT ,GL06AL,$EGMEN  T  )  DATA  BIND  IN GS\POSS ISLE  aspect, 
the  ad  hoc  teams  tended  toward  greater  variaoility  than 
either  the  ad  hoc  individuals  or  the  disciplined  teams 
in  the  number  of  potential  data  bindings. 
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This  chapter  reports  the  interpretive  results  of  the 
study*  namely  the  research  interpretations  based  on  the 
conclusions  presented  in  Chapter  VII.  The  tone  of 
discussion  here  is  purposely  somewhat  subjective  and 
opinionated*  since  the  study's  most  important  results  are 
derived  from  interpreting  the  experiments  immediate 
Hnoings  in  view  of  the  study's  overall  goals*  These 
int  e  rpreta  tions  also  express  the  researcher's  own  estimation 
of  the  study's  implications  and  general  imoort  according  to 
his  professional  intuitions  about  programming  and  software* 

The  interpretations  presented  here  are  neither 
exhaustive  nor  unique*  They  onty  touch  upon  certain  overall 
issues  and  generally  avoid  attaching  meaning  to  or  giving 
exolanation  for  individual  aspects  or  outcomes*  It  is 
anticipated  that  the  reader  and  other  researchers  might 
formulate  additional  or  alternative  interpretations  of  the 
study's  factual  findings*  using  their  own  intuitive 
j  udgments  • 

Two  distinct  sets  of  research  interpretations  are 
discussed  in  the  remainder  of  this  chapter*  The  first  set 
states  general  trends  in  the  conclusions  according  to  the 
basic  suppositions  of  the  study*  The  second  set  states 
general  trends  in  the  conclusions  according  to  a 
classification  of  the  programming  aspects  which  reflects 
certain  abstract  programming  notions  (e*g#,  cost* 
modularity*  data  organi za t i ons *  etc.)* 

Ac  £  o rd i ng  t g  8a§i£  SyBC2Sitl2DS 

The  study's  "basic  suppositions"  (or  "hypotheses")  are 
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a  s  1 1  of  simpleminded  a  oriori  expectations  regarding 
differences  among  the  e  oer imental  programming  environments 
for  location  ana  dispersion  comparisons  on  process  ana 
product  aspects.  These  basic  suppositions  are  stated  in  the 
following  table: 

+ - + - ♦ 

oasic  Suppositions  I  for  Location  I f o  r  Dispersionl 

j  Comparisons  |  Comparisons  I 

♦  - —  — - ♦  —  —  —  — - —  —  —  -f - —  —  —  — — — - + 

I  on  Process  Aspects  I  dT  <  Al  =  AT  I  DT  <  Ai  =  AT  I 

♦  - - - - — - —  ♦ - - —  -  ♦ - + 

|  |  DT  =  AI  <  AT  |  DT  =  AI  <  AT  I 

I  on  Product  Asoects  I  or  I  or  | 

|  I  AT  <  DT  =  AI  I  AT  <  DT  =  AI  I 

+  — - ♦ - ♦ - ♦ 

The  basic  suppositions  are  founded  upon  ••  general 
beliefs'*  regarding  software  phenomena,  which  had  oeen 
formulatea  by  the  researcher  prior  to  conducting  the 
exoeriment*  These  general  oeliefs  state  that 
(a)  methodological  discioline  is  the  key  influence  on  the 
general  efficiency  of  the  process; 

(0)  the  disciplined  methodology  reduces  the  cost  and 
comolexity  of  the  process  and  enhances  the 
pred i ctabi l  i  ty  of  the  process  as  well; 

(c)  the  preferred  direction  for  both  location  and  dispersion 
differences  on  process  aspects  is  clear  and 
undebataole,  because  of  the  familiarity  of  the  process 
aspects  and  the  direct  a p p l  i  c a 0 i l i t y  of  expected  values 
and  variances  in  terms  of  average  cost  estimates  and 
tightness  of  cost  estimates; 

(a)  "mental  cohe siveness”  (or  conceptual  integrity  C3rooks 
75,  pp.  41-503)  is  the  key  influence  on  the  general 
duality  of  the  product; 

(e)  a  programming  team  is  naturally  burdened  (relative  to  an 

individual  programmer)  by  the  organi zat  iona l  overhead 
and  risk  of  error-prone  misunderstanding  inherent  in 
coordinating  and  interfacing  the  tnoughts  and  efforts 
o*  those  on  the  team; 

(f)  the  disciplined  methodology  induces  an  effective  mental 
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cone  s  iv^nes  Si  enabling  a  programming  team  to  behave 
more  like  an  individual  programmer  with  respect  to 
conceptual  control  over  the  program,  its  design,  its 
structure*  etc**  because  of  the  discipline's 
a n t i r eg r e s s i ve «  c omo l e x i t y- cont ro l l i ng  CBelady  and 
Lehman  76*  o*  2A53  effect  that  compensates  for  tne 
inherent  organizational  overhead  of  a  team;  and 
( )  the  nreferred  direction  for  both  location  and  dispersion 
Differences  on  oroduct  aspects  is  not  always  clear, 
because  of  the  unfamiliar ity  of  many  of  the  product 
aspects  ana  a  general  lack  of  understanding  regarding 
the  implication  of  dispersion  for  product  aspects* 

In  view  of  the  general  beliefs  and  basic  suppositions 
stateo  above,  each  possible  comparison  outcome  (cf*  Figure 
3)  may  be  regarded  as  11  voting"  either  for  or  against  a  given 
basic  supposition  Cor  as  "abstaining”)  *  depending  on  whether 
that  outcome  would  substantiate  or  contravene  the 
corresponding  general  beliefs.  For  process  aspects* 

(1)  outcome  DT  <  AI  =  AT  obviously  affirms  the 
suDpos  i  t  ion; 

(?)  outcomes  DT  <  AI  <  AT  or  DT  <  AT  <  AI  ,  which  are 
completely  differentiated  variations  of  the 
supposition's  main  theme,  indirectly  affirm  the 
supposition,  especially  when  DT  <  AI  =  AT  is  the 
stronger  of  the  corresponding  partially 
diMerentiated  outcome  pair; 

(!)  outcome  A I  =  A  T  =  DT  may  negative  the  supposition, 
or  it  may  be  considered  an  abstention  for  any  one 
of  se v  era  l  re  as  on  s 

(it  is  possible  that  (a)  the  aspect's 
critical  level  is  not  low  enough,  so  it 
defaults  to  the  null  outcome;  (b)  the  aspect 
reflects  something  characteristic  of  the 
a cpl ica t ion/ t ask  or  another  factor  common  to 
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dll  the  jroups  in  the  experiment;  or  (c)  the 
aspect  measures  somethin^  fundamental  to 
software  development  phenomena  in  general  ana 
would  always  result  in  the  null  outcome);  and 

(4)  all  other  outcomes  —  AI  <  AT  <  DT,  Al  <  DT  <  AT, 

AT  <  DT  <  A  I  ,  AT  <  AI  <  DT,  AI  *  AT  =  DT 
(AI  <  AT  =  DT,  AT  =  DT  <  AI),  AT  t  AI  =  DT 
(AT  <  DT  =  AI,  DT  -  AI  <  AT),  and  AI  =  AT  <  DT  -- 
neaative  the  supoost ion, 
for  product  aspects, 

(1)  outcomes  Aj  t  DT  =  AI  (AT  <  DT  =  AI,  D  T  =■  A I  <  AT) 

oDviously  affirm  the  sup postion; 

(2)  outcomes  AI  <  DT  <  AT  or  AT  <  DT  <  a  I ,  which  may  oe 

considered  a p pr o x  i  ma t i on s  to  the  supposition  (DT 
is  distinct  from  aT  but  falls  short  of  AI,  due  to 
lack  of  experience  or  maturity  in  the  disciplined 
methodology),  indirectly  affirm  the  supposition, 
esoecially  when  D  T  =  A  I  <  AT  or  AT  <  D"  =  AI 
(respectively)  is  the  stronger  of  the 
corresponding  partially  differentiated  outcome 
pai  r; 

(3)  outcome  AI  -  AT  =  DT  may  negative  the  supposition, 

or  it  may  be  considered  an  abstention  for  any  one 
of  several  reasons 

(it  is  possible  that  (a)  the  aspect's 
critical  level  is  not  low  enough,  so  it 
defaults  to  the  null  outcome;  (b)  the  aspect 
reflects  something  characteri  st ic  of  the 
a pp l i c a t i o n / t as k  or  another  factor  common  to 
all  the  groups  in  the  experiment;  (c)  the 
aspect  measures  something  fundamental  to 
software  development  phenomena  in  general  and 
would  always  result  in  the  null  outcome;  or 
(d)  several  of  the  study's  hit-and-miss 
collection  of  exploratory  product  aspects  are 
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juds  and  may  be  ignored  as  useless  software 
measures); 

(4)  outcomes  AI  <  AT  <  DT,  AT  <  AI  <  DT,  DT  <  A I  <  ATf 

and  OT  <  AT  <  Al  negative  the  supposition; 

(5)  outcomes  OT  *  AI  =  AT  (DT  <  AI  =  AT,  AI  =  aT  <  OT) 

negative  the  suppostions,  especially  discreaiting 
the  belief  that  "mental  c o he s i v ene ss ”  is  the  key 
influence  on  the  product;  and 

(6)  outcomes  AI  t  AT  =  DT  (AI  <  AT  =  OT,  AT  =  OT  <  AI) 

negative  the  supposition,  especially  discreaiting 
the  belief  that  discipline  methodology  effectively 
molds  a  team  into  an  individual# 

Thus,  interpreting  the  study's  findings  according  to 
th?  oasic  suppositions  consists  of  assessing  how  well  the 
research  conclusions  have  borne  out  the  basic  suppositions 
and  how  well  the  experimental  evidence  substantiates  the 
general  beliefs.  On  the  whole,  the  study's  findings  soundly 
suoport  the  general  beliefs  ^resented  aoove,  although  a  few 
conclusions  exist  that  are  inconsistent  with  the  oasic 
suppositions  or  difficult  to  allay  individually# 

Support  for  the  general  beliefs  was  relatively  stronger 
on  process  aspects  than  on  product  aspects,  and  in  location 
comparisons  rather  than  in  dispersion  comparisons# 
Overwhelming  support  came  in  the  category  of  location 
comparisons  on  process  aspects  in  which  the  research 
conclusions  are  distinguished  by  extremely  low  critical 
levels  ana  by  near  unanimity  with  the  basic  supposition.  In 
the  category  of  dispersion  comparisons  on  process  aspects, 
only  two  outcomes  indicated  any  distinction  among  the 
groups:  one  aspect  supported  the  study's  general  beliefs  and 
one  aspect  showed  an  explainable  exception  to  them#  Fairly 
strong  support  also  came  in  the  category  of  location 
comparisons  on  product  aspects  for  which  the  only  negative 
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evidence  (besides  the  neutral  AI  =  AT  =  D  T  conclusions) 
a^eareJ  in  the  form  of  several  A I  i  AT  =  DT  conclusions. 

T  h  ?  y  inai cate  some  areas  in  which  the  disciplined 
methodology  was  apparently  ineffective  in  modifying  a  team's 
behavior  toward  that  of  an  individualf  probably  due  to  a 
lack  of  fully  developed  t r a  i  n i ny /e x p e r  i  en c e  with  the 
met  hoao logy.  Comparatively  weaker  support  for  the  study's 
beliefs  was  recordea  in  the  category  of  dispersion 
comparisons  on  product  aspects.  Although  the  basic 
saoposi tions  were  borne  out  in  a  number  of  the  conclusions, 
tnere  were  also  several  distinctions  of  various  forms  whitn 
contravene  the  oasic  supoos  i  tions. 

Thus,  according  to  this  interpretation,  the  study's 
findings  strongly  substantiate  the  claims  that 

(Cl)  methodological  discipline  is  the  key  influence  on 
the  general  efficiency  of  the  software  development 
proces  s  ,  and  that 

(C2)  the  disciplined  methodology  s i g n i f  i  c an t  l  y  reduces 
the  material  costs  of  software  development. 

The  claims  that 

( C  3 )  mental  cohesiveness  is  the  key  influence  on  the 
general  quality  of  the  software  development 
proauc  t ,  that 

(C4)  relative  to  an  individual,  an  ad  hoc  team  is 

mentally  burdened  by  organizational  overheao,  and 
that 

( C  5 )  the  disciplined  methodology  offsets  the  mental 
burden  of  o r g an i z a t i on a l  overhead  and  enables  a 
team  to  behave  more  like  an  individual  relative  to 
the  software  product, 

are  moderately  substantiated  oy  the  study's  findings,  with 
particularly  mixed  evidence  for  dispersion  comparisons  on 
produc  t  aspects . 
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It  should  oe  noted  that  there  is  a  simpler,  oetter- 
suoported  interpretive  model  for  the  location  results  alone. 
With  the  beliefs  that  a  disciplined  methodology  provioes  for 
tne  minimum  process  cost  and  results  in  a  proouct  which  in 
some  aspects  approximates  the  product  of  an  individual  and 
at  worst  approximates  the  product  developed  by  an  ad  hoc 
team,  the  suppositions  are  DT  <  Al,AT  with  respect  to 
process  and  Ai  <  DT  <  AT  or  AJ  <  DT  <  AI  with  respect  to 
oroduct.  The  study's  findings  support  these  suppositions 
without  exception. 

Ac  £212102  12  s  i^f  i  c  a t  ion 

It  is  desiraole  to  examine  the  study's  findings  in  view 
of  the  way  that  h  ighe r- le ve l  programming  issues  are 
reflected  among  the  individual  programming  aspects.  For 
this  purpose*  the  aspects  considered  in  this  study  were 
grouped  into  (so-called)  programming  aspect  classes.  Each 
class  consists  of  aspects  which  are  related  by  some  common 
feature  (for  example,  all  aspects  relating  to  the  program's 
statements,  statement  tyoes  »  statement  nesting,  etc.),  and 
the  classes  3re  not  necessarily  disjoint  (i.e.,  a  given 
asoect  may  be  included  in  two  or  more  classes).  A  uniaue 
higher-level  programming  issue  (in  the  example,  control 
structure  organi zat ion )  is  associated  with  each  class. 

The  programming  aspects  of  this  study  were  organized 
into  a  hierarchy  of  nine  aspect  classes  (with  about  1 u* 
overlap  overall),  outlined  as  follows: 
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2iat£I”i£¥£i  Cio^r aiming  l££y£:  ^ass : 

Oeve  lop’nent  Process  Efficiency 

Effort  (Job  Steps)  •  • . I 

Errors  (Program  Chances) . II 

Final  Product  Quality 

Gross  Size . . . Ill 

Control-Construct  Structure  •  •  •  *  •  IV 

Data  Variable  Organization  •  V 

Modu  1  a  r  i  t  y 

Packaging  Structure  •••••••VI 

Invocation  Organization  •  •  •  .  •  VII 

Inter-Segment  Communication 

Via  Parameters  . . .  VIII 

Via  Global  Variables  •••••••IX 


The  individual  aspects  comprising  each  class*  together  with 
the  corresponding  conclusions*  are  listed  by  classes  in 
Taoles  b.T  through  6.9.  For  each  aspect  class*  it  is 
interesting  to  jointly  interpret  the  inaividual  outcomes  in 
an  overall  manner  in  order  to  see  something  of  how  these 
higher-level  issues  are  affected  by  team  size  and 
metnodological  discipline. 

C^ass  I:  Effort  (Jo o  Steos) 

Within  Class  I  (process  aspects  dealing  with  COMPUTER 
J03  STEPS)*  there  is  strong  evidence  of  an  important 
difference  among  the  groups  »  in  favor  of  the  disciplined 
methodology*  with  respect  to  average  development  costs.  As 
a  class*  these  aspects  directly  reflect  the  frequency  of 
computer  system  activities  (i.e.*  module  compilations  and 
test  program  executions)  during  development.  They  are  one 
possible  way  of  measuring  machine  costs*  in  units  of  oasic 
activities  rather  than  monetary  charges.  Assuming  that  each 
computer  system  activity  involves  a  certain  expenditure  of 
the  programmer's  time  and  effort  (e.g.f  effective  terminal 
contact*  test  result  evaluation)*  these  aspects  indirectly 
reflect  human  costs  of  development  (at  least  that  portion 
eaclusiv1?  of  design  work). 

The  strength  of  the  evidence  supporting  a  difference 
with  rpspect  to  location  comparisons  within  this  class  is 
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baser1  on  both  (a)  the  near  unanimity  CS  out  of  9  aspects]  of 
the  c  T  <  A I  -  AT  outcome  ana  (b)  the  very  low  critical 
levels  C<#025  for  5  aspects]  involved.  Indeed,  the  single 
exception  among  the  location  comparisons  (Al  -  AT  »  &T  on 
COMPUTE*  JOB  $TE  PS\ MODULE  C O^P I L A T I  0 N S \ I D E N T  I C AL  )  is  readily 
exolained  as  a  direct  consequence  of  the  fact  that  all  teams 
made  essentially  similar  usage  (or  nonuse,  in  this  case, 
since  iaentical  compilations  were  not  uncommon)  of  the  on¬ 
line  storage  capability  (for  saving  relocatable  modules  and 
thus  avoiding  identical  r e c o mp i l a t i on s ) •  This  was  expected 
since  all  teams  had  been  provided  with  identical  storage 
caoaoilityt  but  without  any  training  or  urging  to  use  it. 

The  conclusions  on  location  comparisons  within  this  class 
are  interpreted  as  demonstrating  that 

employment  of  the  disciplined  methodology  by  a 
programming  team  reduces  the  average  costs.  Doth 
machine  and  human,  of  software  development,  relative  to 
both  individual  programmers  and  programming  teams  not 
employing  the  methodology# 

Examination  of  the  raw  data  scores  themselves  indicates  the 
magnitude  of  this  reduction  to  be  on  the  order  of  2  to  1 
(i#e.»  5C%)  or  better# 

with  respect  to  disoersion  comparisons  within  this 
class,  the  evidence  generally  failed  to  make  any 
distinctions  among  the  groups  C  A I  =  AT  =  DT  on  7  out  of  9 
asoectsJ#  These  null  conclusions  in  dispersion  comparisons 
are  interpreted  as  demonstrating  that 

variability  of  software  development  costs,  especially 
machine  costs,  is  relatively  insensitive  to  programming 
team  size  and  degree  of  methodological  discipline# 
rhe  two  exceptions  on  individual  process  aspects  deserve 
mention.  The  COMPUTER  JOB  STE PS \MI SC E LL A NE 0 US  aspect  showed 
a  AT  =  DT  <  AI  dispersion  distinction  among  the  groups, 
reflecting  the  variaoility  (as  expected)  of  individual 
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orogrammers  relative  to  programming  teams  in  the  area  of 
building  on-line  tools  to  indirectly  support  soft -are 
development  (e.g*,  stand-alone  module  drivers,  one-shot 
auxiliary  computations,  table  generators,  u n an t i c  i  p a t e d 
debugging  stubs,  etc*)*  The  MAX  UNIQUE  COMPILATIONS  F.A.O* 
w  0  D  U  L  £  aspect  showed  a  DT  <  AI  =  aT  dispersion  distinction 
amon;  the  groups  ax  an  extremely  low  critical  level  C<*u05], 
reflecting  the  lower  variation  (increased  d red i c t ab i l i t y  )  of 
th?  disciplined  teams  relative  to  the  a  a  hoc  teams  ana 
individuals  in  terms  of  "worst  case1*  compilation  costs  for 
any  one  module.  The  additional  AI  <  At  distinction  for  this 
comparison  is  attributaole  to  the  fact  that  several  teams  in 
group  A j  built  monolithic  s i ng l e -mod u l e  systems,  yielding 
rather  inflated  raw  scores  for  this  aspect* 

Class  II:  Errors  (Program  Changes) 

within  Class  II  (the  process  aspect  PROGRAM  CHANGES), 
there  is  strong  evidence  of  an  important  difference  among 
the  groups,  again  in  favor  of  the  disciplined  methodology, 
witn  respect  to  average  number  of  errors  encountered  during 
imo lementa t ion.  Chapter  v  contains  a  detailed  explanation 
of  now  program  changes  are  counted.  This  aspect  directly 
reflects  the  amount  of  textual  revision  to  the  source  code 
during  (oostdesign)  development*  Claiming  that  textual 
revisions  are  generally  necessitated  by  errors  encountered 
while  building,  testing,  and  debugging  software,  inaependent 
research  CDunsmore  and  Gannon  77]  has  demonstrateo  a  high 
(rank  order)  correlation  between  total  program  changes  (as 
counted  a u t oma t i c a  1 1 y  according  to  a  specific  algorithm)  and 
total  error  occurrences  (as  tabulated  manually  from 
exhaustive  scrutiny  of  source  code  and  test  results)  during 
software  implementation.  This  aspect  is  thus  a  reasonable 
measure  of  the  relative  number  of  programming  errors 
encountered  outside  of  design  work*  Assuming  that  each 
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textual  revision  involves  an  expenaiture  of  programmer 
effort  (e*g*,  planning  the  revision,  on-line  editing  of 
source  code),  this  aspect  inoirectly  reflects  the  level  of 
human  effort  devoteo  to  implementation* 

with  respect  to  location  comparison,  the  strength  of 
the  evidence  supporting  a  difference  among  the  groups  is 
basea  on  the  very  low  critical  level  C<*QC5]  for  the 
DT  <  A I  =  AT  outcome*  The  additional  trend  toward  AI  <  AT 
is  much  less  pronounced  in  the  data*  The  interpretat ion  is 
that 

the  disciplined  methodology  effectively  reduced  the 
average  number  of  errors  encountered  during  software 
i mp l emen  tat  ion* 

This  was  expected  since  the  methodology  purposely  emohasizes 
the  criticality  of  the  design  phase  and  subjects  the 
software  design  (code)  to  thorough  reading  and  review  prior 
to  coding  (key-in  or  testing),  enhancing  error  detection  and 
correction  prior  to  i mp  le me n ta t i on  (testing). 

with  respect  to  disoersion  comparison,  no  distinction 
among  the  groups  was  apparent,  with  the  interpretation  that 
variability  in  the  number  of  errors  encountered  during 
implementation  was  essentially  uniform  across  all  three 
prog  ramming  environments  considered* 

Class  III:  Gross  §2££ 

within  Class  III  (product  aspects  dealing  with  the 
gross  size  of  the  software  at  various  hierarchical  levels)? 
there  is  evidence  of  certain  consistent  differences  among 
the  groups  with  respect  to  both  average  size  and  variability 
of  size.  As  a  class,  these  aspects  directly  reflect  the 
number  of  objects  and  the  average  number  of  component 
(sub)objects  per  object,  according  to  the  hierarchical 
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organization  (imposed  by  the  programming  language)  of 
software  into  objects  such  as  modules*  segments,  data 
vdr iaolest  lines*  statements*  and  tokens. 

with  respect  to  Location  comparisons  witnin  this  class* 
the  non-null  conclusions  [7  out  of  17  aspects]  are  nearly 
unanimous  [5  out  of  7]  in  the  AI  <  AT  =  D  T  outcome.  The 
interpretation  is  that  individuals  tend  to  produce  software 
which  is  smaller  (in  certain  ways)  on  the  average  than  that 
prooucea  by  teams.  It  is  unclear  whether  such  soareness  of 
expression,  primarily  in  segments*  global  variables*  ana 
formal  parameters*  is  advantageous  or  not.  The  two  non-null 
exceot ions  to  this  AI  <  4  T  =  DT  trend  deserve  mention*  since 
the  one  is  Only  nominally  e  xceot iona  l  and  actually 
supportive  of  the  tendency  upon  closer  inspection*  while  the 
other  indicates  a  size  aspect  in  which  the  disciblinea 
methodology  enabled  programming  teams  to  break  out  of  the 
pattern  of  distinction  from  individual  programmers.  The 
AT  -  DT  <  AI  outcome  on  AVERAGE  STATEMENTS  PER  SEGMENT  is  a 
simple  conseauence  of  the  outcome  for  the  number  of 
STATEMENTS  [AI  -  AT  =  DT]  and  the  outcome  for  the  numoer  of 
SEGMENTS  CAI  <  at  =  DT]  and  it  still  fits  the  overall 
pattern  of  A I  i  AT  =  DT  on  location  differences  on  size 
aspects*  On  the  LINES  aspect*  the  DT  =  AI  <  AT  distinction 
breaks  the  pattern  since  DT  is  associated  with  AI  and  not 
with  AT.  Since  the  number  of  statements  was  roughly  the 
same  for  all  three  groups,  this  difference  must  be  due 
mainly  to  the  stylistic  manner  of  arranging  the  source  code 
(which  was  free- format  with  respect  to  line  boundaries)*  to 
the  amount  o*  documentation  comments  within  the  source  code, 
and  to  the  number  of  lines  taken  up  in  data  variable 
dec larations. 

with  respect  to  di soersion  comparisons  within  this 
class*  the  few  aspects  which  do  indicate  any  distinction 
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among  the  groups  C5  out  of  17  aspects]  seem  to  concur  on  the 
41  =  AT  <  OT  outcome*  This  pattern,  which  associates 
increased  variation  in  certain  size  aspects  with  the 
disciplined  methodology,  is  somewhat  surprising  and  lacks  an 
intuitive  explanation  in  terms  of  the  experimental 
treatments.  The  exception  t>T  =  Al  <  AT  on  AVERAGE  SEGMENTS 
PE  ^  v  0  0  U  L  E  is  really  an  exaggeration  due  to  the  fact  of 
several  AT  teams  implementing  monolithic  s i ng l e-mod u l e 
systems,  as  mentioned  a  Dove  •  The  exception  AT  <  DT  =  AI  on 
$  T  4  T  F  E  <\  T  $  is  only  a  very  slight  trend,  reflecting  the  fact 
that  the  AT  products  rather  consistently  contained  the 
largest  numbers  of  statements. 

One  overall  observation  for  Class  ill  is  that  while 
certain  distinctions  did  consistently  appear  (especially  for 
location  out  also  for  dispersion  comparisons)  at  the  middle 
levels  of  the  hierarchical  scale  (segments,  data  variables, 
lines,  ana  statements),  no  distinctions  appeared  at  either 
the  highest  (modules)  or  lowest  (tokens)  levels  of  size. 

The  null  conclusions  for  size  in  modules  and  average  module 
size  seem  attributable  to  the  fact  that  particular 
programming  tasks  or  application  domains  often  have  standard 
designs  at  the  topmost  conceptual  levels  which  strongly 
influence  the  organization  of  software  systems  at  this 
highest  level  of  gross  size.  In  this  case,  the  symool- 
taole/scanning/parsing/code-generation  design  is  extremely 
common  for  language  translation  problems  (i.e.,  compilers), 
regardless  of  the  particular  parsing  technique  or  symbol 
taole  organization  employed,  and  the  modules  of  nearly  every 
system  in  the  study  directly  reflected  this  common  design. 
The  null  conclusions  for  size  in  tokens  is  i nt er p re t at l e  in 
view  of  Halstead^s  software  science  concepts  [Halstead  77], 
accoroing  to  which  the  program  length  \  is  predictable 
♦rom  the  number  of  basic  input-output  parameters  r^*  an0 
the  language  level  X  •  Since  the  functional  specification, 
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ai,oluation  area*  and  implementation  language  were  all  fixed 
in  this  study,  Doth  n^*  and  ^  should  be  constant  for 
each  of  tne  software  systems,  implying  virtually  constant 
crp^ram  lengths  \  •  since  program  length  N  can  oe 
re:aroea  as  roughly  equivalent  to  the  number  of  tokens  in  a 
program,  the  study's  data  seem  to  support  the  software 
science  concepts  in  this  instance. 

Class  IV:  CoQtrQ^-Cgn££rjC£  c^ur e 

-ithin  Class  IV  (proauct  aspects  dealing  with  the 
scf  tire's  organization  according  to  statements,  constructs, 
and  control  structures),  there  are  only  a  few  distinctions 
made  oetween  the  groups. 

*ith  respect  to  location  comparisons,  the  few  C5  out  of 
<?A]  aspects  that  showed  any  distinction  at  all  were 
unanimous  in  concluding  D T  -  AI  <  AT.  Essentially,  tnree 
particular  issues  were  involved.  The  STATEMENT  TYPE  COUNTSX 
IF,  STATEMENT  TYPE  P E ft C EN T A G E S \ I F ,  and  DECISIONS  aspects  are 
all  related  to  the  frequency  of  proy rammer-coded  decisions 
in  the  software  product.  Their  common  outcome  DT  =  AI  <  AT 
is  interpreted  as  demonstrating  an  important  area  in  * h  i  c h 
the  disciplined  methodology  causes  a  programming  team  to 
behave  like  an  individual  programmer.  The  number  of 
decisions  has  been  commonly  accepted,  and  even  formalized 
[McCaoe  763,  as  a  measure  of  program  complexity  since  more 
decisions  create  more  paths  through  the  code.  Thus,  the 
disciplined  methodology  effectively  reduced  the  average 
complexity  from  what  it  otherwise  would  have  been.  The 
STATEMENT  type  CCUNTSXRETURN  aspect  indicates  a  difference 
between  the  ad  hoc  teams  and  the  other  two  groups*  Since 
th?  c  *  I T  and  RETURN  statements  are  restricted  forms  of 
GGTGs,  this  difference  seems  to  hint  at  another  area  in 
which  the  disciplined  methodology  improves  conceptual 
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control  over  program  structure*  The  STATEMENT  TYPE  COUNTSN 
(PT oc ) C All MNTR I NS I C  aspect  also  indicates  a  slight  trend  in 
the  area  of  the  frequency  of  input-outout  operations*  which 
seems  interpretable  only  as  a  result  of  stylistic 
dif  ferences. 

with  respect  to  disoersion  comparisons,  only  two 
particular  issues  were  involved*  The  STATEMENT  TYPE  COUNTSV 
RETURN  ano  STATEmENT  TYPE  PER C EN TA6E \ R E TU RN  aspects  both 
indicated  a  strong  pT  -  AI  <  AT  difference,  suggesting  that 
the  frequency  of  these  restricted  GOTOs  is  an  area  in  which 
the  disciplined  methodology  reduces  variability,  causing  a 
programming  team  to  behave  more  like  an  individual 
programmer.  The  STATEMENT  TYPE  CO UN TS \ ( P RO C  ) C ALL  and 
STATEMENT  T  Yp  E  C OUN T S \ ( PR  0 C ) C A L L \ N ON  I H T * I N S I C  aspects  both 
showed  a  DT  <  Ai  -  AT  distinction  among  the  groups,  which  is 
dealt  with  more  approp r i3 1 e  l  y  within  Class  VII  below. 

In  summary  of  Class  IV,  the  interpretat ion  is  that  the 
functional  component  of  control*construct  organization  is 
largely  unaffected  by  team  size  and  methodological 
discipline,  probably  due  to  the  overriding  effect  of 
croject/task  uni f ormi ty/conunona l ity.  However,  two  facets  of 
the  control  component  that  were  influenced  were  the 
frequency  of  decisions  (especially  IF  statements)  and  the 
frequency  of  restricted  SOTOS  (especially  RETURN 
statements)*  For  these  aspects,  the  disciplined  methodology 
seems  to  have  altered  the  size  of  the  program's  control 
structure  (and  reduced  its  complexity)  from  that  of  a  team's 
proauct  to  that  of  an  i  nd  i  v  i  dua  l  "  s  product* 

Class  V :  fiaia  Variable  2cai0ii§il2Q 

within  Class  V  (product  aspects  deating  with  data 
variaoles  and  their  organization  within  the  software),  there 
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Table  6.5  Conclusions  lor  Class  V*  Data  Variable  Organization 
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ar?  several  distinctions  among  the  groups,  with  an  overall 
trend  for  both  the  location  and  dispersion  comparisons. 

Nata  variable  organization  w a  s ,  however,  not  emphasized  in 
the  disciplined  methodology,  nor  in  the  academic  course 
which  the  participants  in  gr0uo  DT  »ere  taking.  with 
respect  to  location  comparisons,  a  1 1  aspects  Showing  any 
distinction  at  all  were  unanimous  in  concluding 
A i  *  AT  =  DT.  The  trend  for  individuals  to  differ  from 
teams,  regardless  of  the  disciplined  methodology,  appears 
not  only  for  the  total  number  of  data  variaoles  declared, 
but  also  for  data  variables  at  each  scope  level  (global, 
parameter,  local)  in  one  fashion  or  another.  The  difference 
regarding  formal  parameters  is  especially  prominent,  since 
it  shows  up  for  their  raw  count  frequency,  their  normalized 
percentage  frequency,  and  their  average  frequency  oer 
natural  enclosure  (segment).  With  respect  to  dispersion 
comparisons,  the  apparent  overall  trend  for  aspects  which 
show  a  distinction  is  toward  the  AI  =  AT  <  DT  outcome.  No 
particular  interoretat ion  in  view  of  the  experimental 
treatments  seems  appropriate.  Exceptions  to  this  trend 
appeared  for  both  the  raw  count  and  percentage  of  call-by¬ 
reference  paramenters  (both  ai  <  At  =  DT),  as  well  as  two 
other  aspects. 

Class  VI:  Packa^ng  Structure 

within  Class  VI  (product  aspects  dealing  with 
modularity  in  terms  of  the  packaging  structure),  there  are 
essentially  no  distinctions  among  the  groups,  except  for  two 
location  comparison  issues.  Most  of  the  aspects  in  this 
class  are  also  members  of  Class  III,  Gross  Size,  but  are 
( r ? ) c on s i d e re d  here  to  focus  attention  upon  the  packaging 
chjracteristics  of  modularity  (i.e.,  how  the  source  code  is 
divided  into  modules  and  segments,  what  type  of  segments, 
etc.)*  the  disciplined  methodology  did  not  explicitly 
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include  (nor  did  group  £>T's  course  work  cover)  concepts  of 
mod u t a r i za t ion  or  criteria  for  evaluating  good  modularity; 
hence,  no  particular  distinctions  among  the  groups  were 
exoected  in  this  area  (Classes  VI  and  VII)# 

with  respect  to  location  comparisons!  the  Al  <  AT  =  DT 
outcome  for  the  SEGMENTS  aspects*  along  with  the  companion 
outcome  AT  =  DT  <  AI  for  the  AVERAGE  STATEMENTS  PER  SEGMENT 
asoect  (as  explained  under  Class  III  above),  indicates  one 
area  of  packaging  that  is  apparently  sensitive  to  team  size. 
Individual  programmers  built  the  system  with  fewer,  but 
larger  (on  the  average),  segments  than  either  the  ad  hoc 
teams  or  the  disciplined  teams#  The  A I  <  AT  *  OT  outcome 
for  the  AVERAGE  NONGLOBAL  VARIABLES  PER  S EGMENTX P AR AME T ER 
asoect  indicates  that  average  "calling  sequence"  length, 
curiously  enough,  is  another  area  of  packaging  sensitive  to 
team  size#  with  respect  to  dispersion  comparisons,  there 
really  were  no  differences,  since  the  single  non-null 
outcome  for  AVERAGE  SEGMENTS  PER  MODULE  is  actually  a  fluke 
(raw  scores  for  AT  are  exaggerated  by  the  several  monotithic 
systems)  as  explained  above#  The  overall  i n te rp re t a t i on  for 
this  class  is  that 

modularity,  in  the  sense  of  packaging  code  into 
segments  and  modules,  is  essentially  unaffected  by  team 
size  or  methodological  discipline,  except  for  a 
tendency  by  individual  programmers  toward  fewer,  longer 
segments  than  programming  teams# 

Class  VII:  iQvo^ai ign  2£2§Qi££ll2Q 

within  Class  VII  (product  aspects  dealing  with 
modularity  in  terms  of  the  invocation  structure),  there  are 
two  distinction  trends  tor  location  comparisons,  but  no 
clear  pattern  for  the  dispersion  comparison  conclusions# 

This  class  consists  of  raw  counts  and  average-per-segment 
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frequencies  for  invocations  (procedure  CALL  statement*  or 
function  references  in  expressions)  and  is  considered 
seoardtely  from  the  previous  class  since  modularity  involves 
not  only  the  manner  in  which  the  system  is  packaged,  out 
also  the  frequency  with  which  the  pieces  might  be  invoked. 
For  tne  raw  count  frequencies  of  calls  to  intrinsic 
proceaures  ana  intrinsic  routines,  the  trend  is  for  the 
individuals  and  disciplined  teams  to  exhibit  fewer  calls 
than  the  ad  hoc  teams.  These  intrinsic  proceoures  are 
almost  exclusively  the  input-output  operations  of  the 
language,  wnile  the  intrinsic  functions  are  mainly  data  type 
conversion  routines.  The  second  trend  for  location 
comparisons  occurs  tor  two  aspects  (a  third  aspect  is 
actually  redundant)  related  to  the  average  frequency  of 
calls  to  p rog rammer-de f ined  routines,  in  which  the 
individuals  display  higher  average  frequency  than  either 
tyoe  of  team.  This  seems  coupled  with  group  AI's  preference 
for  fewer  but  larger  routines,  as  noted  above.  with  respect 
to  dispersion  comparisons,  several  distinctions  appear 
within  this  class,  but  no  overall  interpretation  is  reaoily 
apparent  (except  for  a  consistent  reflection  of  a  DT  <  AI 
difference,  with  AT  falling  in  between,  leaning  one  side  or 
the  other) • 

Class  VIII:  I Qi£ e q£  t  ¥22  £212  £1 £ 

within  Class  VIII  (product  aspects  dealing  with  inter¬ 
segment  communication  via  formal  parameters),  there  ar*  only 
a  few  distinctions  among  the  groups.  With  respect  to 
location  comparisons,  the  total  frequency  of  parameters  and 
the  average  frequency  of  parameters  per  segment  both  show  a 
difference.  The  interpretation  is  that 

the  individual  programmers  teno  to  incorporate  less 
inter-segment  communication  via  parameters,  on  tne 
average,  than  either  the  ad  hoc  or  the  disciplined 
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p  r 03  r amm i  ng  teams* 

witn  respect  to  dispersion  comparisons,  in  adaition  to  the 
difference  in  the  raw  count  of  parameters  referred  to  in 
Class  v,  there  is  a  strong  difference  in  the  variability  of 
the  number  of  ca ll-Dy-ref erence  parameters,  also  apparent  in 
the  percen tages-by-type-of  parameter  aspects*  The 
interpretation  is  that 

the  individual  programmers  were  more  consistent  as  a 
group  in  their  use  (in  this  case,  avoidance)  of 
reference  parameters  than  either  type  of  programming 
t  earn  . 

C  1 3  s  s  IX:  lQt  ££- JLSSEyQitaiifiQ  ¥12  §i2£2l  ¥2£i2£i£S 

within  Class  IX  (product  aspects  dealing  with  inter¬ 
segment  communication  via  glooal  variables),  there  are 
several  differences  among  the  groups,  including  two  which 
indicate  the  beneficial  influence  of  the  disciplined 
methodology*  This  class  is  composed  of  aspects  dealing  with 
absolute  frequency  of  globals,  average  frequency  of  globals 
oer  modulef  segment-g loba  l  usage  pairs  (frequency  of  access 
paths  from  segments  to  globals),  and  segmen t -y loba l -s egmen t 
data  bindings  (frequency  of  communication  oaths  between 
segments  via  global  variables). 

with  respect  to  location  comparisons,  there  is  tne 
a  I  <  AT  —  OT  distinction  in  sheer  numbers  of  globals, 
particularly  globals  which  are  modified  during  execution,  as 
noted  in  Class  v*  However,  «hen  averaged  per  module,  there 
apoears  to  be  no  distinction  in  the  frequency  of  globals* 

The  AI  <  AT  s  DT  difference  in  the  number  of  possible 
segment-global  access  paths  makes  sense  as  the  result  of 
grouo  A I  having  both  fewer  segments  and  fewer  globals*  All 
three  groups  had  essentially  similar  average  levels  of 
actual  segment-global  access  paths,  but  several  differences 
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apoear  in  the  relative  percentage  ( a c t ua l - 1 o -pos s i b l e  ratio) 
category*  These  three  instances  of  AT  <  DT  =  AI  differences 
indicate  that  the  degree  of  ##g  1  oba  l  i  t  y,#  for  global  variables 
was  higher  for  the  individuals  and  the  disciplined  teams 
than  for  the  ad  hoc  teams*  finally*  another  AT  t  01  =  AI 
difference  appears  for  the  frequency  of  possible  segment- 
g l do a l - se g me n t  data  oindings,  indicating  a  positive  effect 
of  the  disciplined  methodology  in  reducing  the  possioie  data 
coupling  among  segments*  It  may  be  noted  that  these  last 
two  categories  of  aspects,  segme  n  t  -g  loba  l  usage  relative 
percentages  and  s e gme n t -g l o ba l -s eg  me n t  aat a  bindings*  also 
reflect  upon  the  quality  of  mod u l a r i za t i o n ,  since  gooa 
modularity  should  promote  the  decree  of  "g l oba l i t y "  for 
globals  and  minimize  the  data  coupling  among  segments*  The 
i nt e rp r e t a t i on  here  is  that 

certain  aspects  of  inter-segment  communication  via 
globals  seems  to  be  positively  influenced,  on  the 
average,  by  the  disciplined  methodology  * 

with  respect  to  di soers  ion  comparisons*  there  is  a 
diversity  of  differences  in  this  class,  without  any  unifying 
i n t e rp re t a t i on  in  terms  of  the  experimental  treatments. 

-ii&£lUa£QU2 

The  cyclomatic  complexity  and  software  science  metrics, 
whose  results  have  not  been  integrated  into  the  two 
interpretive  frameworks  discussed  aoove,  Definitely  merit 
some  interpretation* 

On  location  comparisons,  the  results  for  cyclomatic 
complexity  measures  exhibited  a  common  underlying  trend, 
namely,  CT  <  AT  <  *1.  In  fact*  the  non-null  outcomes  were 
usually  either  AT  =  DT  <  AI  or  else  DT  <  AI  =  AT.  This  says 
that  -either  the  teams  were  o  i  f  f  e  re  n  t  i  a  t  ed  from  the 


Mi 


CHAPTER  VIII 


individuals  or  else  the  disciplined  methodology  was 

di f f e rent i a  ted  from  the  ad  hoc  approach,  depending  on  the 

oarticular  variation  of  cyclomatic  complexity  involved# 

This  corresponds  well  with  the  intuition  that  team 
programming  alone  Should  force  a  general  reduction  of 
cyclomatic  complexity  for  individual  routines,  and  that  use 
of  the  disciplined  methodology  within  team  programming 
should  promote  this  effect  even  further.  The  observed 
results  for  the  cyclomatic  complexity  metrics  seem  to 
display  this  kind  of  behavior# 

The  generally  weaker  differentiation  ( i  .e . ,  larger 
critical  levels)  observed  for  the  cyclomatic  complexity 
aspects  relative  to  other  aspects  considered  in  the  stuay  is 
auite  understandable  in  light  of  the  fact  that  all  19 
systems  were  coded  in  a  s t ruct ured-p rogrammi ng  language 
which  greatly  restricts  potential  control  flow  patterns.  We 
would  expect  cyclomatic  complexity  metrics  to  be  more  useful 
in  the  context  of  unre s tr i c t  i ve  programming  languages  such 
as  For t  ran  • 

The  results  for  software  science  quantities  are 
somewhat  disappointing:  surprisingly  few  distinctions  among 
the  groups  were  obtained.  On  location  comparisons,  only  the 
vocaoulary  and  estimated  length  metrics  (the  latter  is  a 
♦unction  solely  of  the  former)  yielded  non-null  conclusions. 
Their  A I  <  at  =  dT  outcome  corresponds  to  that  obtained  for 
the  number  of  segments  and  data  variables,  both  of  which 
contribute  heavily  to  the  number  of  "operator/operands." 

The  overall  inte rpreta t  ion  here  is  that  the  software  science 
metrics  appear  to  be  insensitive  to  differences  in  how 
software  is  developed.  *aybe  these  measures,  with  their 
actuarial  nature,  are  sensitive  only  to  gross  factors  in 
software  development  (e.g.,  project,  application  area, 
implementation  language),  all  of  which  were  held  constant  in 
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lx.  susriEI  £2fil£tU5IQS5 


A  practical,  methodology  was  designed  and  developed  for 
exoe r i men t a l l y  and  quantitatively  investigating  software 
development  pnenomena.  It  was  employed  to  compare  three 
particular  software  development  environments  and  to  evaluate 
the  relative  impact  of  a  particular  disciplined  methodology 
(made  up  of  so-called  modern  programming  practices).  The 
experiments  were  successful  in  measuring  differences  among 
programming  environments  ana  the  results  support  the  claim 
that  disciplined  methodology  effectively  improves  botn  the 
process  and  product  of  software  development. 

One  way  to  substantiate  the  claim  for  improved  process 
is  to  measure  the  effectiveness  of  the  particular 
programming  methodology  via  the  number  of  bugs  initially  in 
the  system  (i#e#,  in  the  initial  source  code)  and  the  amount 
of  effort  required  to  remove  them.  (This  criteria  has  been 
suggested  indepe ndent l y  by  Professor  M •  Shooman  of 
pol / t  ec  hn i c  Institute  of  New  York  CShooman  7  33#)  Although 
neither  of  these  measures  was  directly  computed*  they  are 
each  closely  associated  with  one  of  the  process  aspects 
consiaered  in  the  study:  PROGRAM  CHANGES  and  COMPUTER  JOB 
S T E PS \ E SS £  NT  I AL f  resoec ti ve  ly.  The  location  comparison 
statistical  conclusions  for  both  these  aspects  affirmed 
or  <  A l  =  at  outcomes  at  very  low  (<*01)  significance 
levels*  indicating  that  on  the  average  the  disciplined  teams 
measured  lower  than  either  the  ad  hoc  individuals  or  the  ad 
hoc  reams  which  botn  measured  about  the  same#  Thus,  the 
evidence  collected  in  this  study  strongly  confirms  the 
effectiveness  of  the  disciplined  methodology  in  building 
retiaole  software  efficiently# 

The  second  claim,  that  the  product  of  a  Disciplined 
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team  should  closely  reseiole  that  of  a  single  individual 
since  the  disciplined  methodology  assures  a  semblance  of 
conceptual  integrity  within  a  programming  team,  was 
partially  sub s t a n t  i  a t e a  .  In  many  product  aspects  the 
products  developed  using  the  disciplined  methodology  were 
either  similar  to  or  tended  toward  the  products  developed  by 
the  individuals.  In  no  case  did  any  of  the  measures  show 
the  disciplined  teams"  products  to  ce  worse  than  those 
developed  by  the  ad  hoc  teams.  It  is  felt  that  the 
superficiality  o*  most  of  the  product  measures  was  chiefly 
responsible  for  the  lack  of  stronger  support  for  this  second 
claim.  yne  need  for  product  measures  with  increased 
sensitivity  to  critical  c ha rac t e r i st i c s  of  software  is  very 
clear. 


The  results  of  these  experiments  will  be  used  to  guide 
further  experiments  and  will  act  as  a  basis  for  analysis  of 
software  development  products  and  processes  in  the  Software 
Engineering  Laboratory  at  NASA"s  Goddard  Space  Flight  Center 
tBasili  et  al.  771.  The  intention  is  to  pursue  this  type  of 
empirical  research,  especially  extending  the  study  to  more 
soo h i s t i c a  ted  and  promising  software  metrics* 
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